<h1 style="color:blue;">Scenario 3 - Part 1</h1>  

- **C1.S3.Py01 - Bar Plots using MatPlotLib and Seaborn**
- **C1.S3.Py02 - Bar Plots with Percentages**
- **C1.S3.Py03 - Bar Plots with Different Groupings**
- **C1.S3.Py03b - Other Categorical Plots**
- **C1.S3.Py04 - Pie Charts Compared to Bar Plots**
- **C1.S3.Py05 - Histograms**
- **C1.S3.Py06 - Describing Data**
- **C1.S3.Py07 - Box Plots**
- **C1.S3.Py08 - Sub Plots for Comparisons**
- **C1.S3.Py09 - How to Transform Data**

In [None]:
#Code Block 1

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns



#style options 

%matplotlib inline  
#if you want graphs to automatically without plt.show

pd.set_option('display.max_columns',500) #allows for up to 500 columns to be displayed when viewing a dataframe

plt.style.use('seaborn') #a style that can be used for plots - see style reference above



In [None]:
#Code Block 2
df = pd.read_csv('data/Scenario3.csv', index_col = 0, header=0) 
    #DOES NOT set the first column to the index
    # and the top row as the headers

In [None]:
#Code Block 3
df.info()

In [None]:
df['Home Ownership'].value_counts()

### References for visualizations using MatPlotLib or Seaborn

#### Good reference for different plots
- http://python-graph-gallery.com/

#### Seaborn Reference Guide with examples
- https://seaborn.pydata.org/

#### MatplotLib Reference Guide with examples
- https://matplotlib.org/

<h2 style="color:blue;">C1.S3.Py01 - Creating Bar Plots using MatPlotLib and Seaborn</h2> 


#### Reference for Bar Plots
http://python-graph-gallery.com/barplot/

**Note:** MatPlotLib and Seaborn were imported in Code Block 1 as plt and sns

In [None]:
#Code Block 4
sns.barplot(x = "Amount Funded", data = df)
#plt.savefig('plots/barplot_h_AmountFunded.png')

In [None]:
#Code Block 5
sns.barplot(y = "Amount Funded", data = df, color='green')
#plt.savefig('plots/barplot_h_AmountFunded.png')

### Adding title for the plot and x and y axis

In [None]:
#Code Block 6
sns.barplot(y = "Amount Funded", data = df)
plt.title('Average Amount Funded', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')
plt.xlabel('Amount Funded')
plt.ylabel('Dollars')
#plt.savefig('plots/barplot_h_AmountFundedWithTitle.png')

### Group a barplot based on Loan Purpose

In [None]:
#Code Block 7
sns.set(style='white')
sns.barplot(x = "Amount Funded", y = "Loan Purpose", data = df, palette = 'deep')
#plt.savefig('plots/barplot_AmountFunded_LoanPurpose.png')

### How to create a grouped barplot with MatPlotLib

In [None]:
#Code Block 8
result = df.groupby(["Loan Purpose"])['Amount Funded'].mean().reset_index()
result

In [None]:
#Code Block 9
result = result.sort_values('Amount Funded', ascending = False)
result

In [None]:
#Code Block 10
sns.set(style='white')
result.plot(kind='bar', x = "Loan Purpose", y = "Amount Funded", figsize=(20,4))
plt.title('Average Amount Funded', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')

### Comparable seaborn barplot 

In [None]:
#Code Block 11
sns.set(style='dark')
plt.figure(figsize=(20,4))
sns.barplot(x = "Loan Purpose", y = "Amount Funded",  data = df, order = result['Loan Purpose'], palette = 'deep')
plt.title('Average Amount Funded', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')
#plt.savefig('barplot_AmountFunded_LoanPurposeOrder.png')

<h2 style="color:blue;">C1.S3.Py02 - Creating Bar Plots with Percentages</h2> 

In [None]:
#Code Block 12
df_loanpurpose = df.groupby(["Loan Purpose"])['Amount Funded'].count().reset_index()
df_loanpurpose

In [None]:
#Code Block 13
df_loanpurpose.plot(kind='bar', x = "Loan Purpose", y = "Amount Funded", figsize=(20,4))
plt.title('Count Amount Funded', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')

In [None]:
#Code Block 14
sns.set(style='whitegrid')
plt.figure(figsize=(20,4))
sns.countplot(x = "Loan Purpose",  data = df, order = result['Loan Purpose'], palette = 'deep')
plt.title('Count of Loans by Purpose', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')
plt.xlabel('Loan Purpose', fontweight='bold', color = 'green', fontsize='14', horizontalalignment='center')
plt.ylabel('Count', fontweight='bold', color = 'green', fontsize='14', horizontalalignment='center')

In [None]:
#Code Block 15
df_total = df_loanpurpose['Amount Funded'].sum()
df_total

In [None]:
#Code Block 16
df_loanpurpose['Funded Percent'] = df_loanpurpose['Amount Funded'] / df_total
print(df_loanpurpose['Funded Percent'].sum())
df_loanpurpose = df_loanpurpose.sort_values('Amount Funded', ascending = False)
df_loanpurpose

In [None]:
#Code Block 17
df_loanpurpose.plot(kind='bar', x = "Loan Purpose", y = "Funded Percent", figsize=(20,10), color = 'green')
plt.title('Percentage of Loans per Purpose', fontweight='bold', color = 'green', fontsize='17', horizontalalignment='center')
#plt.savefig('barplot_AmountFundedPercent_LoanPurpose.png')

### Embellishments of a plot
 - palette - https://seaborn.pydata.org/tutorial/color_palettes.html
 - order - allows bar to go from large to small or small to large
 - title - top title for the plot, you can also change its format
 - legend - https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.html

<h2 style="color:blue;">C1.S3.Py03 - Creating Bar Plots with Different Groupings</h2>  

- Look at groupings to combine categories

In [None]:
#Code Block 18
sns.set(style='white')
plt.figure(figsize=(20,8))
sns.countplot(x = "Loan Purpose",  data = df, palette="Paired", order = df_loanpurpose['Loan Purpose'])
plt.title('Average Amount Funded', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')

In [None]:
#Code Block 19
df_loanpurpose.loc[df_loanpurpose['Funded Percent'] < 0.05, 'Loan Category'] = 'Other' 
df_loanpurpose.loc[df_loanpurpose['Funded Percent'] >= 0.05, 'Loan Category'] = df_loanpurpose['Loan Purpose']

In [None]:
#Code Block 20
df_loanpurpose_other = df_loanpurpose.groupby(['Loan Category']).sum().reset_index()
df_loanpurpose_other

In [None]:
#Code Block 21
sns.set(style='dark')
plt.figure(figsize=(20,4))
sns.barplot(x = "Loan Category", y = "Amount Funded",  data = df_loanpurpose_other, palette = 'deep')
plt.title('Count of Loans Funded', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')
plt.xlabel('Loan Category', fontweight='bold', color = 'green', fontsize='14', horizontalalignment='center')
plt.ylabel('Count', fontweight='bold', color = 'green', fontsize='14', horizontalalignment='center')

In [None]:
#Code Block 22
sns.set(style='dark')
plt.figure(figsize=(20,4))
sns.barplot(x = "Loan Category", y = "Funded Percent",  data = df_loanpurpose_other, palette = 'Purples', order = df_loanpurpose_other['Loan Category'])
plt.title('Count of Loans Funded', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')
plt.xlabel('Loan Category', fontweight='bold', color = 'green', fontsize='14', horizontalalignment='center')
plt.ylabel('Percentage', fontweight='bold', color = 'green', fontsize='14', horizontalalignment='center')

<h2 style="color:blue;">C1.S3.Py03b - Other Categorical Plots</h2>  

- Plotting with categorical data - https://seaborn.pydata.org/tutorial.html

### Categorical scatterplots:

- :func:`stripplot` (with ``kind="strip"``; the default)
- :func:`swarmplot` (with ``kind="swarm"``)

### Categorical distribution plots:

- :func:`boxplot` (with ``kind="box"``)
- :func:`violinplot` (with ``kind="violin"``)
- :func:`boxenplot` (with ``kind="boxen"``)

### Categorical estimate plots:

- :func:`pointplot` (with ``kind="point"``)
- :func:`barplot` (with ``kind="bar"``)
- :func:`countplot` (with ``kind="count"``)

In [None]:
#Code Block 23
sns.set(style='dark')
plt.figure(figsize=(20,4))
g = sns.catplot(y = "Loan Purpose", x = "Amount Funded", data = df, palette = 'Dark2_r')
g.fig.set_size_inches(20,5)

In [None]:
#Code Block 24
sns.set(style='dark')

g = sns.catplot(y = "Loan Purpose", x = "Amount Funded", data = df, palette = 'deep', hue = "Home Ownership")
g.fig.set_size_inches(30,10)

In [None]:
#Code Block 25
sns.set(style='dark')
plt.figure(figsize=(20,8))
sns.stripplot(y = "Loan Purpose", x = "Amount Funded",  data = df, hue = 'Home Ownership', dodge=True, alpha=.25, zorder=1, palette="GnBu_r")
sns.pointplot(y = "Loan Purpose", x = "Amount Funded",  data = df, hue = 'Home Ownership', dodge=.532, join=False, palette="Reds",
              markers="d", scale=1, ci=None)


In [None]:
#Code Block 26
sns.set(style='white')
plt.figure(figsize=(20,4))
sns.countplot(x = "Loan Purpose",  data = df, palette="Paired", order = df_loanpurpose['Loan Purpose'])
plt.title('Average Amount Funded', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')

In [None]:
#Code Block 27
sns.set(style='dark')
plt.figure(figsize=(20,8))
sns.countplot(x = "Loan Purpose",  data = df, order = df_loanpurpose['Loan Purpose'], palette = 'deep', hue = 'Home Ownership')
plt.title('Average Amount Funded', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')

In [None]:
#Code Block 28
sns.set(style='dark')
plt.figure(figsize=(20,4))
sns.barplot(x = "Loan Purpose", y = "Amount Funded",  capsize=.1, data = df, order = result['Loan Purpose'], palette = 'deep', hue = 'Home Ownership')
plt.title('Average Amount Funded', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')
plt.legend(loc='upper right')

<h2 style="color:blue;">C1.S3.Py04 - Creating Pie Charts Compared to Bar Plots</h2>   

https://matplotlib.org/api/_as_gen/matplotlib.pyplot.pie.html?highlight=pie#matplotlib.pyplot.pie

In [None]:
#Code Block 29
df_homeownership = df.groupby('Home Ownership')['Amount Funded'].count().reset_index()
df_homeownership = df_homeownership.sort_values('Amount Funded', ascending = False)
df_homeownership

In [None]:
#Code Block 30
plt.pie(df_homeownership['Amount Funded'], labels=df_homeownership['Home Ownership'], shadow=True, startangle=200)
plt.savefig('plots/piechart_HomeOwnership.png')

In [None]:
#Code Block 31
plt.figure(figsize=(20,20))
plt.pie(df_homeownership['Amount Funded'], labels=df_homeownership['Home Ownership'], shadow=False, startangle=200)

In [None]:
#Code Block 32
plt.figure(figsize=(10,10))
plt.pie(df_homeownership['Amount Funded'], explode = (0, 0, .2, 0, 0), wedgeprops = {'linewidth': 3}, labels=df_homeownership['Home Ownership'], startangle=140, autopct='%1.1f%%')
plt.savefig('plots/piechart_HomeOwnership_Explode.png')

### Creating subplots to compare pie charts and bar plots

- Link to show how to create subplots - https://python-graph-gallery.com/194-split-the-graphic-window-with-subplot/

In [None]:
#Code Block 33
plt.style.use('ggplot')
plt.figure(figsize=(20,8))
plt.subplot(121)
plt.pie(df_homeownership['Amount Funded'], wedgeprops = {'linewidth': 3}, labels=df_homeownership['Home Ownership'], startangle=140, autopct='%1.1f%%')
plt.subplot(122)
sns.barplot(y = "Home Ownership", x = "Amount Funded",  data = df_homeownership)
plt.savefig('plots/piechart_barplot_HomeOwnership.png')

In [None]:
#Code Block 34
sns.set(style='dark')
plt.figure(figsize=(20,16))
plt.subplot(121)
plt.pie(df_loanpurpose['Funded Percent'], labels=df_loanpurpose['Loan Purpose'], startangle=180, autopct='%1.1f%%')
plt.subplot(122)
sns.barplot(y = "Loan Purpose", x = "Amount Funded",  data = df_loanpurpose, palette = 'deep')
plt.savefig('plots/piechart_barplot_LoanPurpose.png')

In [None]:
#Code Block 35
sns.set(style='dark')
plt.figure(figsize=(20,8))
plt.subplot(121)
plt.pie(df_loanpurpose_other['Funded Percent'], explode = (0.2, 0, 0), wedgeprops = {'linewidth': 3}, labels=df_loanpurpose_other['Loan Category'], startangle=140, autopct='%1.1f%%')
plt.subplot(122)
sns.barplot(x = "Loan Category", y = "Amount Funded",  data = df_loanpurpose_other, palette = 'deep')
plt.savefig('plots/piechart_barplot_LoanCategory.png')

<h2 style="color:blue;">C1.S3.Py05 - Creating a Histogram</h2> 

In [None]:
#Code Block 36
# Plot a simple histogram with binsize determined automatically
sns.distplot(df['Interest Rate'], kde=False, color="b")

In [None]:
#Code Block 37
# Plot a kernel density estimate and rug plot
sns.distplot(df['Interest Rate'], hist=False, rug=True, color="r")

In [None]:
#Code Block 38
sns.distplot(df['Interest Rate'], hist=False, color="g", kde_kws={"shade": True})

In [None]:
#Code Block 39
# Plot a histogram and kernel density estimate and color magenta
sns.distplot(df['Interest Rate'], color="m", bins=10)

<h2 style="color:blue;">C1.S3.Py06 - Describing Data</h2>  

In [None]:
#Code Block 40
df.head()

In [None]:
#Code Block 41
df.describe()

In [None]:
#Code Block 42
df.describe(include='all')

In [None]:
#Code Block 43
df[['Interest Rate', 'Annual Income', 'Amount Funded']].describe()

In [None]:
#Code Block 44
df['Interest Rate'].describe()

In [None]:
#Code Block 45
df_int_rate = df['Interest Rate'].describe().reset_index()
df_int_rate

In [None]:
#Code Block 46
int_skew = df['Interest Rate'].skew()
int_skew

In [None]:
#Code Block 47
new_row = {'index':'skew', 'Interest Rate': int_skew}
#append row to the dataframe
df_int_rate = df_int_rate.append(new_row, ignore_index=True)
df_int_rate

In [None]:
#Code Block 48
int_median = df['Interest Rate'].median()
new_row = {'index':'median', 'Interest Rate': int_median}
#append row to the dataframe
df_int_rate = df_int_rate.append(new_row, ignore_index=True)

int_var = df['Interest Rate'].var()
new_row = {'index':'var', 'Interest Rate': int_var}
#append row to the dataframe
df_int_rate = df_int_rate.append(new_row, ignore_index=True)

df_int_rate

<h2 style="color:blue;">C1.S3.Py07 - Creating Box Plots</h2>   

In [None]:
#Code Block 49
df['Amount Funded'].plot(kind= 'box')

In [None]:
#Code Block 51
sns.boxplot(y = "Amount Funded",  data = df)

In [None]:
#Code Block 52
plt.figure(figsize=(20,8))
sns.boxplot(y = "Amount Funded", x = "Home Ownership", data = df)
plt.savefig('plots/Boxplot_Homeownership.png')

In [None]:
#Code Block 53
plt.figure(figsize=(20,8))
sns.boxplot(x = "Amount Funded", y = "Home Ownership", data = df)

In [None]:
#Code Block 54
plt.figure(figsize=(20,8))
sns.boxplot(y = "Amount Funded", x = "Loan Purpose", data = df)
plt.savefig('plots/Boxplot_LoanPurpose.png')

<h2 style="color:blue;">C1.S3.Py08 - How to Create Sub Plots for Comparisons     </h2>    

In [None]:
#Code Block 55
plt.figure(figsize=(16,8))
plt.title('Boxplot', fontweight='bold', color = 'green', fontsize='17', horizontalalignment='center')
sns.boxplot(y = "Annual Income", x = "Home Ownership", data = df, palette = "Greens")

In [None]:
#Code Block 56
plt.figure(figsize=(20,8))
plt.title('Violinplot', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')
sns.violinplot(y = "Annual Income", x = "Home Ownership", data = df, palette = "Blues")

In [None]:
#Code Block 57
plt.figure(figsize=(20,8))
plt.title('Boxenplot', fontweight='bold', color = 'red', fontsize='17', horizontalalignment='center')
sns.boxenplot(y = "Annual Income", x = "Home Ownership", data = df, palette = "Reds")

In [None]:
#Code Block 58
plt.figure(figsize=(20,4))
plt.subplot(131)
plt.title('Boxplot', fontweight='bold', color = 'green', fontsize='17', horizontalalignment='center')
sns.boxplot(y = "Annual Income", x = "Home Ownership", data = df, palette = "Greens")
plt.subplot(132)
plt.title('Violinplot', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')
sns.violinplot(y = "Annual Income", x = "Home Ownership", data = df, palette = "Blues")
plt.subplot(133)
plt.title('Boxenplot', fontweight='bold', color = 'red', fontsize='17', horizontalalignment='center')
sns.boxenplot(y = "Annual Income", x = "Home Ownership", data = df, palette = "Reds")

In [None]:
#Code Block 59
df_income_onemil = df[df['Annual Income'] < 1000000]

In [None]:
#Code Block 60
plt.figure(figsize=(16,8))
plt.title('Boxplot', fontweight='bold', color = 'green', fontsize='17', horizontalalignment='center')
sns.boxplot(y = "Annual Income", x = "Home Ownership", data = df_income_onemil, palette = "Greens")


In [None]:
#Code Block 61
plt.figure(figsize=(16,8))
plt.title('Boxplot', fontweight='bold', color = 'green', fontsize='17', horizontalalignment='center')
plt.ylim((0,1000000))
sns.boxplot(y = "Annual Income", x = "Home Ownership", data = df, palette = "Greens")


In [None]:
#Code Block 62
plt.figure(figsize=(16,8))
plt.title('Boxplot', fontweight='bold', color = 'green', fontsize='17', horizontalalignment='center')
plt.ylim((0,400000))
sns.boxplot(y = "Annual Income", x = "Home Ownership", data = df, palette = "Greens")


In [None]:
#Code Block 63
plt.figure(figsize=(20,12))
ax1 = plt.subplot2grid((2, 2), (0, 0), colspan=2)
plt.title('Boxplot', fontweight='bold', color = 'green', fontsize='17', horizontalalignment='center')
plt.ylim((0,400000))
ax1 = sns.boxplot(y = "Annual Income", x = "Home Ownership", data = df, palette = "Greens")
ax2 = plt.subplot2grid((2, 2), (1, 0), colspan=1)
plt.title('Violinplot', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')
ax2 = sns.violinplot(y = "Annual Income", x = "Home Ownership", data = df, palette = "Blues")
ax3 = plt.subplot2grid((2, 2), (1, 1), colspan=1)
plt.ylim((0,1000000))
plt.title('Boxenplot', fontweight='bold', color = 'red', fontsize='17', horizontalalignment='center')
ax3 = sns.boxenplot(y = "Annual Income", x = "Home Ownership", data = df, palette = "Reds")
plt.savefig('plots/Box_Violin_BoxenPlots_Homeownership.png')

<h2 style="color:blue;">C1.S3.Py09 How to Transform Data</h2>   

In [None]:
#Code Block 64
df_income = df['Annual Income']
df_income = pd.DataFrame(df_income).reset_index()
df_income

### Z-Score Explained
- https://www.statisticshowto.com/probability-and-statistics/z-score/

#### Example of a z-score
- mean = 10
- standard deviation = 5
- X = 17.5
- Z-score = (17.5 - 10) / 5 = 1.5

In [None]:
#Code Block 65
mean_income = df_income['Annual Income'].mean()
stdev_income = df_income['Annual Income'].std()
df_income['z_score'] = ((df_income['Annual Income']-mean_income)/stdev_income)
df_income

### What is a log?
- How many of one number do we multiply to get another number?

#### log(e) 
- Also known as the natural log used Euler's Number which is approximately 2.718
- Example: loge(7.389) ≈ 2 - because 2.71828^2 ≈ 7.389

#### log2
- What is log2(64)?
- We are asking "how many 2s need to be multiplied together to get 64?"
- 2 × 2 × 2 × 2 × 2 × 2 = 64, so we need 6 of the 2s
- Answer: log2(64) = 6

#### log10
- What is log10(1000)?
- We are asking "how many 10s need to be multiplied together to get 1000?"
- 10 × 10 × 10 = 1000, so we need 3 of the 10s
- Answer: log100(1000) = 3

In [None]:
#Code Block 66
df_income['loge'] = np.log(df_income['Annual Income'])
df_income['log2'] = np.log2(df_income['Annual Income'])
df_income['log10'] = np.log10(df_income['Annual Income'])
df_income.head(15)

In [None]:
#Code Block 67
plt.figure(figsize=(20,16))
plt.subplot(231)
plt.title('Raw Data', fontweight='bold', color = 'green', fontsize='17', horizontalalignment='center')
sns.distplot(df_income['Annual Income'], color="g", bins = 20)
plt.subplot(232)
plt.title('Z Score', fontweight='bold', color = 'blue', fontsize='17', horizontalalignment='center')
sns.distplot(df_income['z_score'], color="b", bins = 20)
plt.subplot(233)
plt.title('Natural log', fontweight='bold', color = 'red', fontsize='17', horizontalalignment='center')
sns.distplot(df_income['loge'], color="r", bins = 20)
plt.subplot(234)
plt.title('Log2', fontweight='bold', color = 'black', fontsize='17', horizontalalignment='center')
sns.distplot(df_income['log2'], color="y", bins = 20)
plt.subplot(235)
plt.title('Log10', fontweight='bold', color = 'red', fontsize='17', horizontalalignment='center')
sns.distplot(df_income['log10'], color="m", bins = 20)
plt.subplot(236)
plt.title('Log10', fontweight='bold', color = 'red', fontsize='17', horizontalalignment='center')
sns.distplot(df_income['log10'], color="m", bins = 50)
plt.savefig('plots/Transform_Histogram_AnnualIncome.png')