## Data Visualization with Seaborn: Categorical Plots

Seaborn provides an API on top of matplotlib which uses sane plot & color defaults, uses simple functions for common statistical plot types, and which integrates with the functionality provided by Pandas dataframes. [[Source](https://www.oreilly.com/learning/data-visualization-with-seaborn)]

It’s useful to divide seaborn’s categorical plots into three groups: 
- those that show each observation at each level of the categorical variable (swarmplot, stripplot)
- those that show a representation of the distribution of observations (boxplot)
- those that compute a statistic for the second variable, e.g. number of observations or count (barplot an

The first includes the functions swarmplot() and stripplot(), the second includes boxplot() and violinplot(), and the third includes barplot() and pointplot()

<b>Resources</b>: [[Seaborn API](http://seaborn.pydata.org/api.html#categorical-plots)][[Seaborn - Plotting with Categorical Data](http://seaborn.pydata.org/tutorial/categorical.html#plotting-with-categorical-data)]

![alt text](images/pokemon.jpg)

### About the Dataset: 
---
This data set includes 721 Pokemon, including their number, name, first and second type, and basic stats: HP, Attack, Defense, Special Attack, Special Defense, and Speed. These are the raw attributes that are used for calculating how much damage an attack will do in the games. This dataset is about the pokemon games (NOT pokemon cards or Pokemon Go). 

- <b>#</b>: ID for each pokemon
- <b>Name</b>: Name of each pokemon
- <b>Type 1</b>: Primary type; each pokemon has a type, this determines weakness/resistance to attacks
- <b>Type 2</b>: Secondary type; some pokemon are dual type and have 2
- <b>Total</b>: sum of all stats that come after this, a general guide to how strong a pokemon is
- <b>HP</b>: hit points, or health, defines how much damage a pokemon can withstand before fainting
- <b>Attack</b>: the base modifier for normal attacks (eg. Scratch, Punch)
- <b>Defense</b>: the base damage resistance against normal attacks
- <b>SP Atk</b>: special attack, the base modifier for special attacks (e.g. fire blast, bubble beam)
- <b>SP Def</b>: the base damage resistance against special attacks
- <b>Speed</b>: determines which pokemon attacks first each round

The data for this table has been acquired from several different sites, including: pokemon.com, pokemondb, and bulbapeida.
    
<b>Source</b>: https://www.kaggle.com/abcsds/pokemon

### Getting Started
---
1. Download: [Pokemon dataset](https://github.com/wwcodemanila/WWCodeManila-ML.AI/blob/master/datasets/pokemon.csv)
2. Import the necessary libraries (`pandas`, `numpy`, `matplotlib.pyplot`)
3. Import seaborn. The way to import seaborn is:
    ```python
    import seaborn as sns
    ```
4. Load the dataset 

In [6]:
# Write your code here

#### 1. [Bar Plot](http://seaborn.pydata.org/tutorial/categorical.html#bar-plots)
- Show the bar plot containing the count of each Primary type (i.e. Type 1) of Pokemon. [[Hint](https://seaborn.pydata.org/generated/seaborn.countplot.html)]
    - Note that if the plot does not display properly, you may need to call `plt.show()`.
- For vertical charts: Notice how the x labels overlap? 
    - Option 1: Rotate the labels. [[Hint](https://stackoverflow.com/questions/39689352/plotting-bar-plot-in-seaborn-python-with-rotated-xlabels/39689464#39689464)]
    - Option 2: Convert it to a horizontal chart. 
- Try displaying the countplot in either ascending or descending order. [[Hint](https://stackoverflow.com/questions/46623583/seaborn-countplot-order-categories-by-count)]
- Write down your observations from the plot. Which type has the highest count? lowest?

In [7]:
# Write your code here

#### 2. [Strip Plot](http://seaborn.pydata.org/tutorial/categorical.html#categorical-scatterplots)
- A strip plot is basically a [scatter plot](https://en.wikipedia.org/wiki/Scatter_plot) where one variable is categorical.
- Display the Attack of each Pokemon and its Primary Type using a strip plot. [[Hint](http://seaborn.pydata.org/generated/seaborn.stripplot.html#seaborn.stripplot)]
- What do you observe from the plot? What are the limitation of this type of plot?

In [8]:
# Write your code here

#### 3. [Swarm Plot](http://seaborn.pydata.org/generated/seaborn.swarmplot.html#seaborn.swarmplot)
- A swarm plot is another categorical scatter plot where the points <i>do not</i> overlap.
- Show the Total of each Pokemon and its Primary Type using a swarm plot. [[Hint](http://seaborn.pydata.org/generated/seaborn.swarmplot.html#seaborn.swarmplot)]
- Show the Attack of each Pokemon and its Primary Type using a swarm plot. 
- Show the Defense of each Pokemon and its Primary Type using a swarm plot.
- Notice the distribution of the points. What do you observe?

In [9]:
# Write your code here

At a certain point, the categorical scatterplot approach becomes limited in the information it can provide about the distribution of values within each category. There are several ways to summarize this information in ways that facilitate easy comparisons across the category levels. [[Source](http://seaborn.pydata.org/tutorial/categorical.html#distributions-of-observations-within-categories)]

#### 4. [Box and Whisker Plot](http://seaborn.pydata.org/tutorial/categorical.html#boxplots)
<img align="right" style="width: 200px;" src="https://datavizcatalogue.com/methods/images/anatomy/box_plot.png">
A box and whisker plot (sometimes called a boxplot) shows the three quartile values of the distribution along with extreme values. 

More specifically, this kind of plot is a graph that presents information from a five-number summary:
- <b>upper extreme</b>: Q3 + 1.5 IQR
- <b>upper quartile (Q3)</b>: median of the upper half of the data set
- <b>median</b>: middle value of the data set
- <b>lower quartile (Q1)</b>: median of the lower half of the data set
- <b>lower extreme</b>: Q1 – 1.5 IQR
- (Interquartile Range (IQR) = Upper Quartile (Q3) – Lower Quartile (Q1))

[Learn more about Box and Whisker Plots here.](https://www.statcan.gc.ca/edu/power-pouvoir/ch12/5214889-eng.htm)

- Create 5 graphs which show, for each Primary type, the box plots of:
    - Total
    - Attack
    - Defense
    - Speed
    - HP
- Which types have the highest Total, Attack, Defense, etc. based on the box plots? 
- Which types have the lowest Total, Attack, Defense, etc. based on the box plots?
- What are your other observations?

In [10]:
# Write your code here