## A step-by-step guide to Data Visualizations in Python

Follow along with [this article](https://medium.com/codex/step-by-step-guide-to-data-visualizations-in-python-b322129a1540).  We will be using [this .xlsx dataset](https://www.kaggle.com/roshansharma/immigration-to-canada-ibm-dataset) from Kaggle on Immigration to Canada from 1980–2013. No need to dowload it, it is already in the git repositroy you forked.<br>

#### We suggest that instead of copy and pasting the code, you type it out.  This will help you become more familiar with the syntex and better understand it.


### Step-1: Importing Packages

In [1]:
#import all librarys and modules needed
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from matplotlib import style

#setting style for graphs

style.use('ggplot')
plt.rcParams['figure.figsize'] = (20,10)


### Step-2 : Importing and Cleaning Data 

In [2]:
#import and clean data, remember path is data/Canada.xlsx
df = pd.read_excel('Canada.xlsx' ,1, skiprows = range(20), skipfooter = 2)
df.rename(columns = {'OdName': 'country', 'AreaName': 'continent', 'RegName' : 'region'}, inplace = True)
df['total'] = df.sum(axis = 1)
df = df.set_index('country')
df.rename(index = {'United Kingdom of Great Britain and Northern Ireland' 'UK & Ireland'}, inplace = True)
df.columns = df.columns.astype(str)

#Useful for upcoming visualizations

years = list(map(str, range(1980,2013)))

### Step-3 : Creating Beautiful Visualizations

#### Line Chart

In [1]:
# Single line chart
fig1 = df.loc['Haiti', years].plot(kind = 'line', color = 'r')
plt.title('Immigration from Haiti to Canada from 1980-2013', color = 'black')
plt.xlabel('Years', color = 'black')
plt.ylabel('Number of Immigrants', color = 'black')
plt.xticks(color = 'black')
plt.yticks(color = 'black')
plt.savefig('linechart_single.png')

plt.show()


In [2]:
# Multiple Line chart 
fig2 = plt.plot(df.loc['India', years], label = 'India')
plt.plot(df.loc['China', years],  label = 'China')
plt.plot(df.loc['Philippines', years], label = 'Sri Lanka')
plt.legend(loc = 'upper left', fontsize = 12)
plt.xticks(rotation = 90, color = 'black')
plt.yticks(color = 'black')
plt.title('Immigration to Canada from 1980-2013', color = 'black')
plt.xlabel('Year' ,color = 'black')
plt.ylabel('Number of Immigrants' ,color = 'black')
plt.savefig('linechart_multiple.png')

plt.show()

#### Lets talk about style   

In [3]:
#Shows all avaiable built-in styles
print(plt.style.available)

To see a visualization of the aviable style sheets [click here](https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html). 

Syntex to select a specific style is: plt.style.use('style_name') 

Try it out by adding the line of code to the top of the code block above and choose one of the preinstalled styles. Which style is your favorite?<br><br>

What happens when you change the line- plt.legend(loc = 'upper left', fontsize = 12) to plt.legend(loc = 'lower right', fontsize = 12)? <br><br><br><br>



Experiment changing other lines of the code and see how the graph changes. Add any notes or observations here. Going forward feel free to experiment with each graph type<br><br><br><br><br><br>

#### Install mplcyberpunk
Open a terminal window and at at the prompt type:

python -V

If its python 3.something, copy and paste: pip install mplcyberpunk
If its python 2.something, copy and paste: pip3 install mplcyberpunk


For more info on mplcyberpunk click [here.](https://github.com/dhaitz/mplcyberpunk)

In [4]:
# Cyberpunk Multiple Line Chart



#### Bar Chart

In [5]:
# Vertical bar chart
# Do not change the style back to ggplot 
# delete the style.use('ggplot') line of code



Notice that style is still set to cyberpunk.  How do we fix it so we can see the labels?<br> <br>

Answer: change the color = 'black' to 'white'

Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [7]:
# Horizontal bar chart
#change style back to ggplt


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [8]:
# Grouped bar chart


Notice how the labels in the legend have disapeared?  We can fix this by adding labelcolor='k' to plt.legend<br>
<br>plt.legend(title = 'Country', fontsize = 12, labelcolor='black')



Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Area Chart

In [9]:
# Area Chart


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [10]:
# cyberpunk simple area chart


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [11]:
# stacked area chart


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [12]:
# unstacked area chart


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Box Plot

In [13]:
# Vertical Box Plot


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [14]:
# horizontal box plot


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Scatter Plot

With the newist version of Seaborn we have to specify x and y.

example: sb.scatterplot(x = 'sepal_length', y = 'sepal_width', data = df_iris)

In [15]:
#scatter plot comparing sepal length to sepal width


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Histogram

With the newist version of Seaborn they have deprecated distplot. Replace distplot with histplot and add kde = True

example:
sb.histplot(df_iris['sepal_length'], color = 'Red', label = 'Sepal Length', kde = True)

In [16]:
#Histogram side by side, with kde


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Bubble Plot

In [17]:
# Bubble Plot


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Pie Chart

Skip this one as the code throws an error. 

df_pie = pd.DataFrame(df.groupby('continent')['total'].sum().T)
colors = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue', 'lightgreen', 'pink']
explode = [0,0.1,0,0,0.1,0.1]

plt.pie(df_pie, colors = colors, autopct = '%1.1f%%', startangle = 90, explode = explode, pctdistance = 1.12, shadow = True)
plt.title('Continent-Wise Immigrants Distribution', color = 'black', y = 1.1, fontsize = 18)
plt.legend(df_pie.index, loc = 'upper left', fontsize = 12)
plt.axis('equal')
plt.savefig('pie.png')

plt.show()

#### Doughnut Chart

In [18]:
# Doughnut Chart


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Regression Plot

In [19]:
# Strong trend


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [20]:
# Weak trend


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Word Cloud

You might need to pip install wordcloud and pip install Pywaffle.  Follow the instruction from when you installed cyperpunk earlier.

Make sure to add the txt files from [here](https://github.com/codinglikeagirl42/DataVisualizationPython) to your data folder and remember the path is data/filename.txt. Try creating your own txt file to visualize.

In [21]:
# word cloud
from wordcloud import WordCloud, STOPWORDS


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Lollipop Chart

In [22]:
# Lollipop chart


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>