### What is pandas?
In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The name is derived from the term "<b>pan</b>el <b>da</b>ta", an econometrics term for multidimensional, structured data sets.

(Source: https://en.wikipedia.org/wiki/Pandas_(software))

### Today, we'll be exploring a Pokemon Dataset! 
![alt text][logo] 
[logo]: http://www.mtv.co.uk/sites/default/files/styles/image-w-760-scale/public/mtv_uk/galleries/large/2016/11/21/landscape-1456483171-pokemon2.jpg?itok=1cVzFcqr

#### About the Dataset: 
This data set includes 721 Pokemon, including their number, name, first and second type, and basic stats: HP, Attack, Defense, Special Attack, Special Defense, and Speed. It has been of great use when teaching statistics to kids. With certain types you can also give a geeky introduction to machine learning.

This are the raw attributes that are used for calculating how much damage an attack will do in the games. This dataset is about the pokemon games (NOT pokemon cards or Pokemon Go).

The data as described by Myles O'Neill is:

- <b>#</b>: ID for each pokemon
- <b>Name</b>: Name of each pokemon
- <b>Type 1</b>: Primary type; each pokemon has a type, this determines weakness/resistance to attacks
- <b>Type 2</b>: Secondary type; some pokemon are dual type and have 2
- <b>Total</b>: sum of all stats that come after this, a general guide to how strong a pokemon is
- <b>HP</b>: hit points, or health, defines how much damage a pokemon can withstand before fainting
- <b>Attack</b>: the base modifier for normal attacks (eg. Scratch, Punch)
- <b>Defense</b>: the base damage resistance against normal attacks
- <b>SP Atk</b>: special attack, the base modifier for special attacks (e.g. fire blast, bubble beam)
- <b>SP Def</b>: the base damage resistance against special attacks
- <b>Speed</b>: determines which pokemon attacks first each round

The data for this table has been acquired from several different sites, including:
    - pokemon.com
    - pokemondb
    - bulbapeida
    
Source: https://www.kaggle.com/abcsds/pokemon

### Getting Started
1. Download: [Pokemon dataset](https://github.com/wwcodemanila/WWCodeManila-ML.AI/blob/master/datasets/pokemon.csv)
2. Import the necessary libraries (`pandas`, `numpy`, `matplotlib.pyplot`)
3. Load the dataset 

### Warmup
Note: Most commands for this section can be found in our [first machine learning project](https://github.com/wwcodemanila/WWCodeManila-ML.AI/blob/master/tutorials/Intro-to-Machine-Learning.ipynb).

1. How many pokemon are there in the dataset?
2. What are the columns in the dataset? 
3. What are the datatypes of each column?
4. What are the first 5 pokemon in the dataset?
5. What are the last 5 pokemon in the dataset?
6. Print a summary of the data (count, mean, min, max, etc.)

### Cleaning the Data
1. Right now, the dataset is indexed by numbers 0 - 800. You can confirm this by printing `df.index`. Suppose we want to index the pokemon by 'Name' instead of by number. Set the index to become the 'Name' column. [Hint](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html)
2. Drop the '#' column and inspect your dataset for changes. [Hint1](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html) [Hint2](https://stackoverflow.com/questions/22149584/what-does-axis-in-pandas-mean)
3. - Option 1: Remove all spaces in the column names (e.g. 'Type 1' becomes 'Type1'); 
   - Option 2: Replace all spaces in the column names with underscores (e.g. 'Type 1' becomes 'Type_1'). [Hint](https://stackoverflow.com/questions/30763351/removing-space-in-dataframe-python)

### Basic Selection
1. Retrieve the row data of the first pokemon in the dataset by <b>integer-location</b> based indexing. [Hint](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html)
2. Retrieve the row data of the first pokemon in the dataset by <b>label-location</b> based indexing. [Hint](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html) 
3. Retrieve the row data of the <b>20th</b> pokemon in the dataset by <b>integer-location</b> based indexing. [Hint](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html)
4. Retrieve the row data of <b>Ninetales</b> in the dataset by <b>label-location</b> based indexing. [Hint](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html) 
5. Retrieve the row data of your favorite Pokemon.  

### Selection and Counting
Note: For the following items, refer to [this tutorial](http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.1/cookbook/Chapter%202%20-%20Selecting%20data%20&%20finding%20the%20most%20common%20complaint%20type.ipynb).

1. Select just the 'Total' column of the first 5 Pokemon. [Hint](http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.1/cookbook/Chapter%202%20-%20Selecting%20data%20&%20finding%20the%20most%20common%20complaint%20type.ipynb#2.2-Selecting-columns-and-rows)
2. Select just the Total and HP columns of the first 5 Pokemon. [Hint](http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.1/cookbook/Chapter%202%20-%20Selecting%20data%20&%20finding%20the%20most%20common%20complaint%20type.ipynb#2.3-Selecting-multiple-columns)
3. How many Pokemon are Legendary types? How many are not? [Hint](http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.1/cookbook/Chapter%202%20-%20Selecting%20data%20&%20finding%20the%20most%20common%20complaint%20type.ipynb#2.4-What's-the-most-common-complaint-type?)
4. How many Pokemon belong to each Generation? 
5. How many Pokemon belong to each Primary type (e.g. Bug, Dark, etc.)?
6. How many Pokemon belong to each Secondary type (e.g. Bug, Dark, etc.)?
7. How many Pokemon belong to each Type in general (i.e. combine Primary and Secondary Types)? [Hint](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.add.html)
8. What's the most common Type of Pokemon?

<b>Challenge: </b> Plot the bar graphs for items 3, 4, and 7! 
- Label the axes properly [Hint](https://stackoverflow.com/questions/42223587/plt-scatter-how-to-add-title-and-xlabel-and-ylabel)
- Make sure to call `plt.show()` if you're using jupyter notebook.
- Plus points if you can order the elements in the bar plot in descending order (i.e. highest first) [Hint1](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sort_values.html) [Hint2](https://stackoverflow.com/questions/22149584/what-does-axis-in-pandas-mean)
- Afterwards, try plotting only the top 10 elements in each graph.

<b>Note</b>: Congrats on making it this far! No more hints from this point forward, good luck! 

### Selection by Type
Note: For the following items, refer to [this tutorial](http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.1/cookbook/Chapter%203%20-%20Which%20borough%20has%20the%20most%20noise%20complaints%3F%20%28or%2C%20more%20selecting%20data%29.ipynb).
1. Retrieve the first 3 Pokemon in the dataset whose Primary type is 'Flying'. 
2. Retrieve the first 3 Flying Type Pokemon (i.e. either Primary type is Flying or Secondary type is Flying)
3. Retrieve the first 3 Fire Type Pokemon (i.e. either Primary type is Fire or Secondary type is Fire)
4. Retrieve the first 5 Pokemon that are both Bug type and Flying type. 
5. Retrieve the first 5 Pokemon that are either Dragon type or Flying type (or both). 
6. - What are your favorite Pokemon Types? 
   - Choose one (or two) types, and retrieve all Pokemon that fit that criteria. (e.g. I like Pokemon that are both Psychic and Flying)
7. Retrieve the row data of <b>Charizard</b> and <b>Charmander</b> in a single table. [Hint: use df.index]

### Selection by Characteristic
Retrieve the name and statistics of the Pokemon as requested below.

<b>Bonus Challenge</b>: If there are more than one Pokemon for each category, (e.g. there might be more than one Pokemon sharing the same highest Total value), print all of them.

1. Strongest Pokemon (i.e. has the highest 'Total' value) 
2. Pokemon with the highest HP
3. Pokemon with the highest Attack
4. Pokemon with the highest Defense
5. Pokemon with the highest Special Attack
6. Pokemon with the highest Special Defense
7. Pokemon with the highest Speed 
8. For items 1-7, get their lowest counterpart.
9. Print the top 10 strongest Pokemon in descending order. 
10. Print the top 10 weakest Pokemon in ascending order.
11. Print the top 10 strongest legendary Pokemon in descending order. 

<b>Bonus:</b> Have questions of your own? (e.g. which Ghost Pokemon has the highest attack?) Answer them here!

### Final Task
Suppose Professor Oak allows you to choose from 5 Pokemon from his entire collection to be in your starter team. However, he has the following restrictions:
- Your first pokemon must be one of the 1st generation starter pokemon: <b>Bulbasaur, Squirtle, Charmander, Pikachu, Eevee</b>
- You cannot choose a legendary Pokemon or a Pokemon whose name contains the word "Mega" to be in your team. [Hint](https://stackoverflow.com/questions/17097643/search-for-does-not-contain-on-a-dataframe-in-pandas)
- No two Pokemon can have the same Primary type.
- You can only have at most 2 Pokemon from the same generation.

Your task is: 
1. Mine the dataset for additional information
2. Using what you've discovered, which 5 Pokemon will you choose and why? 

### Congrats on finishing the exercise!