## A step-by-step guide to Data Visualizations in Python

Follow along with [this article](https://medium.com/codex/step-by-step-guide-to-data-visualizations-in-python-b322129a1540).  We will be using [this .xlsx dataset](https://www.kaggle.com/roshansharma/immigration-to-canada-ibm-dataset) from Kaggle on Immigration to Canada from 1980–2013. No need to dowload it, it is already in the git repositroy you forked.<br>

#### We suggest that instead of copy and pasting the code, you type it out.  This will help you become more familiar with the syntex and better understand it.


### Step-1: Importing Packages

In [1]:
#import all librarys and modules needed
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
from matplotlib import style

style.use('ggplot')
plt.rcParams['figure.figsize'] = (20,10)


### Step-2 : Importing and Cleaning Data 

In [11]:
#import and clean data, remember path is data/Canada.xlsx
df = pd.read_excel('Data\Canada.xlsx',2) #, skiprows = range(20), skipfooter = 2)

In [12]:
df.head(20)

Unnamed: 0,Type,Coverage,OdName,AREA,AreaName,REG,RegName,DEV,DevName,1980,...,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013
0,Immigrants,Foreigners,Afghanistan,935,Asia,5501,Southern Asia,902,Developing regions,16,...,2978,3436,3009,2652,2111,1746,1758,2203,2635,2004
1,Immigrants,Foreigners,Albania,908,Europe,925,Southern Europe,901,Developed regions,1,...,1450,1223,856,702,560,716,561,539,620,603
2,Immigrants,Foreigners,Algeria,903,Africa,912,Northern Africa,902,Developing regions,80,...,3616,3626,4807,3623,4005,5393,4752,4325,3774,4331
3,Immigrants,Foreigners,American Samoa,909,Oceania,957,Polynesia,902,Developing regions,0,...,0,0,1,0,0,0,0,0,0,0
4,Immigrants,Foreigners,Andorra,908,Europe,925,Southern Europe,901,Developed regions,0,...,0,0,1,1,0,0,0,0,1,1
5,Immigrants,Foreigners,Angola,903,Africa,911,Middle Africa,902,Developing regions,1,...,268,295,184,106,76,62,61,39,70,45
6,Immigrants,Foreigners,Antigua and Barbuda,904,Latin America and the Caribbean,915,Caribbean,902,Developing regions,0,...,14,24,32,15,32,38,27,37,51,25
7,Immigrants,Foreigners,Argentina,904,Latin America and the Caribbean,931,South America,902,Developing regions,368,...,1591,1153,847,620,540,467,459,278,263,282
8,Immigrants,Foreigners,Armenia,935,Asia,922,Western Asia,902,Developing regions,0,...,147,224,218,198,205,267,252,236,258,207
9,Immigrants,Foreigners,Australia,909,Oceania,927,Australia and New Zealand,901,Developed regions,702,...,930,909,875,1033,1018,1018,933,851,982,1121


In [23]:
df.drop(['AREA','REG','DEV','Type','Coverage','DevName'], axis=1, inplace=True)

KeyError: "['AREA' 'REG' 'DEV' 'Type' 'Coverage' 'DevName'] not found in axis"

In [34]:
df.rename(columns={'OdName':'country', 'AreaName':'continent', 'RegName':'region'}, inplace=True)
df['total'] = df.sum(axis=1)
df.set_index('country')
df.value_counts('country')

#df.tail()

  df['total'] = df.sum(axis=1)


country
Afghanistan      1
New Caledonia    1
Nicaragua        1
Niger            1
Nigeria          1
                ..
Germany          1
Ghana            1
Greece           1
Grenada          1
Zimbabwe         1
Length: 196, dtype: int64

In [45]:
pd.set_option('display.max_rows', None)
df['country'].value_counts().sort_value('country')

AttributeError: 'Series' object has no attribute 'sort_value'

### Step-3 : Creating Beautiful Visualizations

#### Line Chart

In [1]:
# Single line chart



In [2]:
# Multiple Line chart 


#### Lets talk about style   

In [3]:
#Shows all avaiable built-in styles
print(plt.style.available)

To see a visualization of the aviable style sheets [click here](https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html). 

Syntex to select a specific style is: plt.style.use('style_name') 

Try it out by adding the line of code to the top of the code block above and choose one of the preinstalled styles. Which style is your favorite?<br><br>

What happens when you change the line- plt.legend(loc = 'upper left', fontsize = 12) to plt.legend(loc = 'lower right', fontsize = 12)? <br><br><br><br>



Experiment changing other lines of the code and see how the graph changes. Add any notes or observations here. Going forward feel free to experiment with each graph type<br><br><br><br><br><br>

#### Install mplcyberpunk
Open a terminal window and at at the prompt type:

python -V

If its python 3.something, copy and paste: pip install mplcyberpunk
If its python 2.something, copy and paste: pip3 install mplcyberpunk


For more info on mplcyberpunk click [here.](https://github.com/dhaitz/mplcyberpunk)

In [4]:
# Cyberpunk Multiple Line Chart



#### Bar Chart

In [5]:
# Vertical bar chart
# Do not change the style back to ggplot 
# delete the style.use('ggplot') line of code



Notice that style is still set to cyberpunk.  How do we fix it so we can see the labels?<br> <br>

Answer: change the color = 'black' to 'white'

Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [7]:
# Horizontal bar chart
#change style back to ggplt


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [8]:
# Grouped bar chart


Notice how the labels in the legend have disapeared?  We can fix this by adding labelcolor='k' to plt.legend<br>
<br>plt.legend(title = 'Country', fontsize = 12, labelcolor='black')



Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Area Chart

In [9]:
# Area Chart


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [10]:
# cyberpunk simple area chart


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [11]:
# stacked area chart


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [12]:
# unstacked area chart


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Box Plot

In [13]:
# Vertical Box Plot


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [14]:
# horizontal box plot


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Scatter Plot

With the newist version of Seaborn we have to specify x and y.

example: sb.scatterplot(x = 'sepal_length', y = 'sepal_width', data = df_iris)

In [15]:
#scatter plot comparing sepal length to sepal width


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Histogram

With the newist version of Seaborn they have deprecated distplot. Replace distplot with histplot and add kde = True

example:
sb.histplot(df_iris['sepal_length'], color = 'Red', label = 'Sepal Length', kde = True)

In [16]:
#Histogram side by side, with kde


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Bubble Plot

In [17]:
# Bubble Plot


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Pie Chart

Skip this one as the code throws an error. 

df_pie = pd.DataFrame(df.groupby('continent')['total'].sum().T)
colors = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue', 'lightgreen', 'pink']
explode = [0,0.1,0,0,0.1,0.1]

plt.pie(df_pie, colors = colors, autopct = '%1.1f%%', startangle = 90, explode = explode, pctdistance = 1.12, shadow = True)
plt.title('Continent-Wise Immigrants Distribution', color = 'black', y = 1.1, fontsize = 18)
plt.legend(df_pie.index, loc = 'upper left', fontsize = 12)
plt.axis('equal')
plt.savefig('pie.png')

plt.show()

#### Doughnut Chart

In [18]:
# Doughnut Chart


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Regression Plot

In [19]:
# Strong trend


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

In [20]:
# Weak trend


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Word Cloud

You might need to pip install wordcloud and pip install Pywaffle.  Follow the instruction from when you installed cyperpunk earlier.

Make sure to add the txt files from [here](https://github.com/codinglikeagirl42/DataVisualizationPython) to your data folder and remember the path is data/filename.txt. Try creating your own txt file to visualize.

In [21]:
# word cloud
from wordcloud import WordCloud, STOPWORDS


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>

#### Lollipop Chart

In [22]:
# Lollipop chart


Feel free to experiment and add any notes or observations here. <br><br><br><br><br>