# External Influences on California High School Performance
ECE 143 Spring 2018 Group Project
Group 5: Kenny Chen, Kevin Lai, Yan Sun

### 1. Desciption of Project




### 2. Description of Public Datasets
##### 2.1. California SAT Scores Dataset:
https://www.cde.ca.gov/ds/sp/ai/

##### 2.2. California Crime Dataset:
* Crime Events Data: https://openjustice.doj.ca.gov/data
* Population Data: https://data.ca.gov/dataset/california-population-projection-county-age-gender-and-ethnicity

##### 2.3. California Unemployment Rate Dataset:
https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/ 
##### 2.4. California Physical Test Dataset:
https://www.cde.ca.gov/ta/tg/pf/pftresearch.asp



# 3. Dependencies

* python3
* pandas == 0.22.0
* numpy == 1.14.1
* pickle
* glob
* zipfile
* imageio == 2.1.2
* matplotlib == 2.2.0
* plotly == 2.7.0
(Sign up for Plotly at https://plot.ly/)

# 4. Preprocess

* Download all the aforementioned datasets.
* Put them in the same path with notebooks for preprocessing in ```./src/data_preprocess/``` then run the preprocessing notebooks.
* Then all the dataframes contain preprocessed data could be generated and saved into pickle files.
* These pickle files are saved in ```./data/```.

# 5. Animation

In this part, plotly library is used for animation. Sign up for Plotly is requried at https://plot.ly/. 

After Signing in, go to the top right corner --- setting --- API Keys to generate an API key.

In [15]:
import pickle
import plotly
import plotly.plotly as py
import plotly.figure_factory as ff

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Log in here with username and api_key

In [17]:
# Use this line to login in Plotly
plotly.tools.set_credentials_file(username='yansun1996', api_key='qkwnFoQhO3qdkWGLe2vL')

# Use this line if you want to open offline mode if needed
# plotly.offline.init_notebook_mode()

Read Basic Map Information for Plotly

In [19]:
df_sample = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/minoritymajority.csv')
df_sample_r = df_sample[df_sample['STNAME'] == 'California']

# fips code is the ID for each county
fips = df_sample_r['FIPS'].tolist()

# define color scale for plotly map
colorscale = [
    'rgb(49,54,149)',
    'rgb(69,117,180)',
    'rgb(116,173,209)',
    'rgb(171,217,233)',
    'rgb(224,243,248)',
    'rgb(254,224,144)',
    'rgb(253,174,97)',
    'rgb(244,109,67)',
    'rgb(215,48,39)',
    'rgb(165,0,38)'
]

With all the required information configured, the pipeline for making static maps of different data for California coudl be implemented.

### 5.1. Static Map for Crime Rate

In [None]:
df_crime = pickle.load(open("./data/crime_rates.pkl","rb"))

for year in range(2000,2015):
    crime_rates = [a * 100 for a in df_crime.xs(year,axis=1).tolist()]

    fig = ff.create_choropleth(
        fips=fips, values=crime_rates, scope=['CA'],
        binning_endpoints=[5, 9, 13, 15, 18, 21, 25, 30], 
        colorscale=colorscale,
        round_legend_values=True,
        county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
        show_state_data=True,
        legend_title='Crime Rate (Percentage)',
        title='California Crime Rate by Counties in '+str(year)
    )
    
    # Uncomment these two lines for offline Mode
    #filename = 'California_Crime_Rate_Map_Year_'+str(year)
    #plotly.offline.iplot(fig,image="png",filename=filename)
    
    # Uncomment these lines for online mode
    filename = 'California_Crime_Rates_Map_Year_'+str(year)+'.png'
    py.image.save_as(fig,filename)
    # Uncomment this line to use online mode to show image in notebook
    #py.iplot(fig, filename='choropleth_california_and_surr_states_outlines')

### 5.2. Static Map for Unemployment Rate

In [None]:
df_unemployment = pickle.load(open("./data/unemployment_rates.pkl","rb"))

for year in range(2007,2015):
    unemployment_rates = [a for a in df_unemployment.xs(str(year),axis=1).tolist()]

    fig = ff.create_choropleth(
        fips=fips, values=unemployment_rates, scope=['CA'],
        binning_endpoints=[2, 3, 5, 7, 10, 13, 15, 22], 
        colorscale=colorscale,
        round_legend_values=True,
        county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
        show_state_data=True,
        legend_title='Unemployment Rate (Percentage)',
        title='California Unemployment Rate by Counties in '+str(year)
    )

    # Uncomment these two lines for offline Mode
    #filename = 'California_Unemployment_Rate_Map_Year_'+str(year)
    #plotly.offline.iplot(fig,image="png",filename=filename)
    
    # Uncomment these lines for online mode
    filename = 'Unemployment_Rate_Map_Year_'+str(year)+'.png'
    py.image.save_as(fig,filename)
    # Uncomment this line to use online mode to show image in notebook
    #py.iplot(fig, filename='choropleth_california_and_surr_states_outlines')

### 5.3. Static Map for Physical Test Pass Rate

In [None]:
df_physical = pickle.load(open("./data/physical_fitness_data.pkl","rb")).transpose()

for year in range(2007,2014):
    physical_scores = [a for a in df_physical.xs(str(year),axis=1).tolist()]

    fig = ff.create_choropleth(
        fips=fips, values=physical_scores, scope=['CA'],
        binning_endpoints=[10, 20, 25, 30, 35, 40, 43, 47], 
        colorscale=colorscale,
        round_legend_values=True,
        county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
        show_state_data=True,
        legend_title='Physical Pass Rate (%)',
        title='California Physical Pass Rate by Counties in '+str(year)
    )
    
    # Uncomment these two lines for offline Mode
    #filename = 'California_Physical_Test_Score_Map_Year_'+str(year)
    #plotly.offline.iplot(fig,image="png",filename=filename)
    
    # Uncomment these lines for online mode
    filename = 'California_Physical_Test_Score_Map_Year_'+str(year)+'.png'
    py.image.save_as(fig,filename)
    # Uncomment this line to use online mode to show image in notebook
    #py.iplot(fig, filename='choropleth_california_and_surr_states_outlines')

### 5.4. Static Map for SAT Scores

In [None]:
df_sat = pickle.load(open("./data/sat_data.pkl","rb"))

for i in range(7,len(df_sat)):
    sat_verbal = df_sat[i].xs(df_sat[i].columns[1],axis=1).tolist()
    sat_math = df_sat[i].xs(df_sat[i].columns[2],axis=1).tolist()
    total = [a+b for a,b in zip(sat_verbal,sat_math)]
    default_none = sum(total)/len(total)
    total.insert(1,default_none)
    if i == 9: total.insert(45,default_none)

    fig = ff.create_choropleth(
        fips=fips, values=total, scope=['CA'],
        binning_endpoints=[880, 910, 940, 970, 1000, 1030, 1060, 1090], 
        colorscale=colorscale,
        round_legend_values=True,
        county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
        show_state_data=True,
        legend_title='Average SAT Score for Verbal + Math',
        title='California SAT Score by Counties in '+str(2007+i-7)
    )
    
    # Uncomment these lines for offline Mode
    #filename = 'California_SAT_Score_Map_Year_'+str(2007+i-8)
    #plotly.offline.iplot(fig,image="png",filename=filename)
    
    # Uncomment these two lines for online mode
    filename = 'California_SAT_Score_Map_Year_'+str(2007+i-7)+'.png'
    py.image.save_as(fig,filename)
    # Uncomment this line to use online mode to show image in notebook
    #py.iplot(fig, filename='choropleth_california_and_surr_states_outlines')

### 5.5. Make gif

In [None]:
import imageio

crime_rate_img_names = [imageio.imread('California_Crime_Rates_Map_Year_'+
                    str(year)+'.png') for year in range(2007,2014)]
unemployment_rate_img_names = [imageio.imread('Unemployment_Rate_Map_Year_'+
                    str(year)+'.png') for year in range(2007,2014)]
physical_scores_img_names = [imageio.imread('California_Physical_Test_Score_Map_Year_'+
                    str(year)+'.png') for year in range(2007,2014)]
sat_scores_img_names = [imageio.imread('California_SAT_Score_Map_Year_'+
                    str(year)+'.png') for year in range(2007,2014)]

In [None]:
imageio.mimsave('crime_rate_animation.gif',crime_rate_img_names,duration=1)
imageio.mimsave('unemployment_rate_animation.gif',unemployment_rate_img_names,duration=1)
imageio.mimsave('physical_pass_rate_animation.gif',physical_scores_img_names,duration=1)
imageio.mimsave('sat_scores_animation.gif',sat_scores_img_names,duration=1)

# 6. Analysis

Plot the trending over years for average data of all counties in California.

### 6.1. Crime Rate

In [None]:
df_crime = pickle.load(open("./data/crime_rates.pkl","rb"))
crime_rate_year_avg = []
year_list = [a for a in range(2007,2014)]
for year in year_list:
    crime_rates = [a for a in df_crime.xs(year,axis=1).tolist()]
    crime_rate_year_avg.append(sum(crime_rates)*100/len(crime_rates))

plt.figure()
plt.title("California Average Crime Rate Over years")
plt.ylabel("Average Crime Rates (%)")
plt.xlabel("Year")
plt.plot(year_list,crime_rate_year_avg)
plt.savefig("crime_rate_year_avg.png")
plt.show()

### 6.2. Unemployment Rate

In [None]:
df_unemploymentdf_unemp  = pickle.load(open("./data/unemployment_rates.pkl","rb"))
unemployment_rate_year_avg = []
year_list = [a for a in range(2007,2014)]
for year in year_list:
    unemployment_rates = [a for a in df_unemployment.xs(str(year),axis=1).tolist()]
    unemployment_rate_year_avg.append(sum(unemployment_rates)/len(unemployment_rates))
    
plt.figure()
plt.title("California Average Unemployment Rate Over years")
plt.ylabel("Average Unemployment Rates (%)")
plt.xlabel("Year")
plt.plot(year_list,unemployment_rate_year_avg)
plt.savefig("unemployment_rate_year_avg.png")
plt.show()    

### 6.3. Physical Test Pass Rate

In [None]:
df_phyisical = pickle.load(open("./data/physical_fitness_data.pkl","rb")).transpose()
physical_scores_year_avg = []
year_list = [a for a in range(2007,2014)]
for year in year_list:
    physical_scores = [a for a in df_phyisical.xs(str(year),axis=1).tolist()]
    physical_scores_year_avg.append(sum(physical_scores)/len(physical_scores))

plt.figure()
plt.title("California Average Physical Pass Rate Over years")
plt.ylabel("Average Physical Pass Rate")
plt.xlabel("Year")
plt.plot(year_list,physical_scores_year_avg)
plt.savefig("physical_scores_year_avg.png")
plt.show()

### 6.4. SAT Score

In [None]:
df_sat = pickle.load(open("./data/sat_data.pkl","rb"))
sat_scores_year_avg = []
year_list = [a for a in range(2007,2014)]
for i in range(7,len(df_sat)):
    sat_verbal = df_sat[i].xs(df_sat[i].columns[1],axis=1).tolist()
    sat_math = df_sat[i].xs(df_sat[i].columns[2],axis=1).tolist()
    total = [a+b for a,b in zip(sat_verbal,sat_math)]
    default_none = sum(total)/len(total)
    total.insert(1,default_none)
    if i == 9: total.insert(45,default_none)
    sat_scores_year_avg.append(sum(total)/len(total))

plt.figure()
plt.title("California Average SAT Scores Over years")
plt.ylabel("Average SAT Scores")
plt.xlabel("Year")
plt.plot(year_list,sat_scores_year_avg)
plt.savefig("sat_scores_year_avg.png")
plt.show()

# 7. Scatter Plot

The average data for all counties cannot show details about the correlation, more plots are required for further analysis, especially the pair analysis.

In [None]:
df_sat = pickle.load(open("./data/sat_data.pkl","rb"))
df_physical = pickle.load(open("./data/physical_fitness_data.pkl","rb"))
df_unemployment = pickle.load(open("./data/unemployment_rates.pkl","rb"))
df_crime = pickle.load(open("./data/crime_rates.pkl","rb"))

Collect SAT statistical data

In [None]:
sat_total = []
sat_total_county = np.zeros((1,58))
for i in range(7,len(df_sat)):
    sat_verbal = df_sat[i].xs(df_sat[i].columns[1],axis=1).tolist()
    sat_math = df_sat[i].xs(df_sat[i].columns[2],axis=1).tolist()
    total = [a+b for a,b in zip(sat_verbal,sat_math)]
    default_none = sum(total)/len(total)
    total.insert(1,default_none)
    if i == 9: total.insert(45,default_none)
    sat_total.append(total)
    sat_total_county += np.array(total)

### 7.1. SAT score vs Unemployment Rate

In [None]:
# Make scatter plot
np.random.seed(123)
plt.figure()
for year in range(2007,2013):
    x_list = df_unemployment.xs(str(year),axis=1).tolist()
    y_list = sat_total[year-2007]
    plt.scatter(x_list,y_list, c=np.random.rand(3,))
    plt.ylabel("Total SAT Scores")
    plt.xlabel("Unemployment Rate (%)")
plt.title("Total SAT Scores vs Unemployment Rate\n")
plt.legend([str(a) for a in range(2007,2014)])
plt.show()

# Make scatter plot with trend line
np.random.seed(123)
plt.figure()
for year in range(2007,2013):
    x_list = df_unemployment.xs(str(year),axis=1).tolist()
    y_list = sat_total[year-2007]
    plt.scatter(x_list,y_list, c=np.random.rand(3,))
    plt.ylabel("Total SAT Scores")
    plt.xlabel("Unemployment Rate (%)")
plt.title("Total SAT Scores vs Unemployment Rate\n")
plt.legend([str(a) for a in range(2007,2014)])
np.random.seed(123)
for year in range(2007,2013):
    x_list = df_unemployment.xs(str(year),axis=1).tolist()
    y_list = sat_total[year-2007]
    z = np.polyfit(x_list, y_list, 1)
    p = np.poly1d(z)
    plt.plot(x_list,p(x_list),"r--", c=np.random.rand(3,))
plt.show()

### 7.2. SAT Score vs Crime Rate

In [None]:
# Make scatter plot
np.random.seed(123)
plt.figure()
for year in range(2007,2014):
    x_list = [a*100 for a in df_crime.xs(year,axis=1).tolist()]
    y_list = sat_total[year-2007]
    plt.scatter(x_list,y_list, c=np.random.rand(3,))
    plt.ylabel("Total SAT Scores")
    plt.xlabel("Crime Rate (%)")
plt.title("Total SAT Scores vs Crime Rate\n")
plt.legend([str(a) for a in range(2007,2014)])
plt.show()

# Make scatter plot with trend line
np.random.seed(123)
plt.figure()
for year in range(2007,2014):
    x_list = [a*100 for a in df_crime.xs(year,axis=1).tolist()]
    y_list = sat_total[year-2007]
    plt.scatter(x_list,y_list, c=np.random.rand(3,))
    plt.ylabel("Total SAT Scores")
    plt.xlabel("Crime Rate (%)")
plt.title("Total SAT Scores vs Crime Rate\n")
plt.legend([str(a) for a in range(2007,2014)])
np.random.seed(123)
for year in range(2007,2014):
    x_list = [a*100 for a in df_crime.xs(year,axis=1).tolist()]
    y_list = sat_total[year-2007]
    z = np.polyfit(x_list, y_list, 1)
    p = np.poly1d(z)
    plt.plot(x_list,p(x_list),"r--", c=np.random.rand(3,))
plt.show()

### 7.3. Physical Test Pass Rate vs Unemployment Rate

In [None]:
# Make scatter plot
np.random.seed(123)
plt.figure()
for year in range(2007,2014):
    x_list = [a*100 for a in df_unemployment.xs(str(year),axis=1).tolist()]
    y_list = [a for a in df_physical.transpose().xs(str(year),axis=1).tolist()]
    plt.scatter(x_list,y_list, c=np.random.rand(3,))
    plt.ylabel("Physical Test Pass Rate (%)")
    plt.xlabel("Unemployment Rate (%)")
plt.title("Physical Test Pass Rate vs Unemployment Rate\n")
plt.legend([str(a) for a in range(2007,2014)])
plt.show()

# Make scatter plot with trend line
np.random.seed(123)
plt.figure()
for year in range(2007,2014):
    x_list = [a*100 for a in df_unemployment.xs(str(year),axis=1).tolist()]
    y_list = [a for a in df_physical.transpose().xs(str(year),axis=1).tolist()]
    plt.scatter(x_list,y_list, c=np.random.rand(3,))
    plt.ylabel("Physical Test Pass Rate (%)")
    plt.xlabel("Unemployment Rate (%)")
plt.title("Physical Test Pass Rate vs Unemployment Rate\n")
plt.legend([str(a) for a in range(2007,2014)])
np.random.seed(123)
for year in range(2007,2014):
    x_list = [a*100 for a in df_unemployment.xs(str(year),axis=1).tolist()]
    y_list = [a for a in df_physical.transpose().xs(str(year),axis=1).tolist()]
    z = np.polyfit(x_list, y_list, 1)
    p = np.poly1d(z)
    plt.plot(x_list,p(x_list),"r--", c=np.random.rand(3,))    
plt.show()

### 7.4. Physical Test Pass Rate vs Crime Rate

In [None]:
# Make scatter plot
np.random.seed(123)
plt.figure()
for year in range(2007,2014):
    x_list = [a*100 for a in df_crime.xs(year,axis=1).tolist()]
    y_list = [a for a in df_physical.transpose().xs(str(year),axis=1).tolist()]
    plt.scatter(x_list,y_list, c=np.random.rand(3,))
    plt.ylabel("Physical Test Pass Rate (%)")
    plt.xlabel("Crime Rate (%)")
plt.title("Physical Test Pass Rate vs Crime Rate\n")
plt.legend([str(a) for a in range(2007,2014)])
plt.show()

# Make scatter plot with trend line
np.random.seed(123)
plt.figure()
for year in range(2007,2014):
    x_list = [a*100 for a in df_crime.xs(year,axis=1).tolist()]
    y_list = [a for a in df_physical.transpose().xs(str(year),axis=1).tolist()]
    plt.scatter(x_list,y_list, c=np.random.rand(3,))
    plt.ylabel("Physical Test Pass Rate (%)")
    plt.xlabel("Crime Rate (%)")
plt.title("Physical Test Pass Rate vs Crime Rate\n")
plt.legend([str(a) for a in range(2007,2014)])
np.random.seed(123)
for year in range(2007,2014):
    x_list = [a*100 for a in df_crime.xs(year,axis=1).tolist()]
    y_list = [a for a in df_physical.transpose().xs(str(year),axis=1).tolist()]
    z = np.polyfit(x_list, y_list, 1)
    p = np.poly1d(z)
    plt.plot(x_list,p(x_list),"r--", c=np.random.rand(3,))
plt.show()

The scatter plot with trend line is able to show the basic trending information for the correlation of analyzed pair properties. In order to further analyze the distribution information among the pair analysis, more advanced plots are required to visualize their relationship.

# 8. Box Plot and Violin Plot

Collect Pair Data for SAT Score

In [None]:
crime_sat_paircrime_sa  = []
unemployment_sat_pair = []
for year in year_list:
    crime_rates = [a for a in df_crime.xs(year,axis=1).tolist()]
    unemployment_rates = [a for a in df_unemployment.xs(str(year),axis=1).tolist()]
    
    sat_verbal = df_sat[year-2007+7].xs(df_sat[year-2007+7].columns[1],axis=1).tolist()
    sat_math = df_sat[year-2007+7].xs(df_sat[year-2007+7].columns[2],axis=1).tolist()
    total = [a+b for a,b in zip(sat_verbal,sat_math)]
    default_none = sum(total)/len(total)
    total.insert(1,default_none)
    if i == 9: total.insert(45,default_none)    

    crime_sat_pair += [(a,b) for a,b in zip(crime_rates,total)]
    unemployment_sat_pair += [(a,b) for a,b in zip(unemployment_rates,total)]

### 8.3. SAT Score vs Crime Rate

In [None]:
crime_sat_pair_batch_1 = []
crime_sat_pair_batch_2 = []
crime_sat_pair_batch_3 = []
crime_sat_pair_batch_4 = []

for t in crime_sat_pair:
    if t[0]<=0.1:
        crime_sat_pair_batch_1.append(t[1])
    if t[0]>0.1 and t[0]<=0.15:
        crime_sat_pair_batch_2.append(t[1])
    if t[0]>0.15 and t[0]<=0.2:
        crime_sat_pair_batch_3.append(t[1])
    if t[0]>0.2:
        crime_sat_pair_batch_4.append(t[1])
        
crime_sat_pair_batches = [crime_sat_pair_batch_1,
                          crime_sat_pair_batch_2,
                          crime_sat_pair_batch_3,
                          crime_sat_pair_batch_4]

In [None]:
plt.xticks(np.arange(1,5),['rate<=10%', '10%<rate<=15%', '15%<rate<=20%', 'rate>20%'])
plt.ylabel('SAT Score')
plt.xlabel('Crime Rate')
plt.title('SAT Score vs Crime Rate \nBox Plot')
plt.boxplot(crime_sat_pair_batches, vert=True,showmeans=True)
plt.show()

plt.xticks(np.arange(1,5),['rate<=10%', '10%<rate<=15%', '15%<rate<=20%', 'rate>20%'])
plt.ylabel('SAT Score')
plt.xlabel('Crime Rate')
plt.title('SAT Score vs Crime Rate \nViolin Plot')
plt.violinplot(crime_sat_pair_batches, vert=True,showmeans=True)
plt.show()

### 8.2. SAT Score vs Unemployment Rate

In [None]:
crime_sat_pair_batch_1 = []
crime_sat_pair_batch_2 = []
crime_sat_pair_batch_3 = []
crime_sat_pair_batch_4 = []

for t in crime_sat_pair:
    if t[0]<=0.1:
        crime_sat_pair_batch_1.append(t[1])
    if t[0]>0.1 and t[0]<=0.15:
        crime_sat_pair_batch_2.append(t[1])
    if t[0]>0.15 and t[0]<=0.2:
        crime_sat_pair_batch_3.append(t[1])
    if t[0]>0.2:
        crime_sat_pair_batch_4.append(t[1])
        
crime_sat_pair_batches = [crime_sat_pair_batch_1,
                          crime_sat_pair_batch_2,
                          crime_sat_pair_batch_3,
                          crime_sat_pair_batch_4]

In [None]:
plt.xticks(np.arange(1,5),['rate<=10%', '10%<rate<=15%', '15%<rate<=20%', 'rate>20%'])
plt.ylabel('SAT Score')
plt.xlabel('Unemployment Rate')
plt.title('SAT Score vs Unemployment Rate \nBox Plot')
plt.boxplot(unemployment_sat_pair_batches, vert=True,showmeans=True)
plt.show()

plt.xticks(np.arange(1,5),['rate<=10%', '10%<rate<=15%', '15%<rate<=20%', 'rate>20%'])
plt.ylabel('SAT Score')
plt.xlabel('Unemployment Rate')
plt.title('SAT Score vs Unemployment Rate \nViolin Plot')
plt.violinplot(unemployment_sat_pair_batches, vert=True,showmeans=True)
plt.show()

Collect Pair Data for Physical Test Pass Rate

In [None]:
crime_physical_pair = []
unemployment_physical_pair = []
for year in year_list:
    crime_rates = [a for a in df_crime.xs(year,axis=1).tolist()]
    unemployment_rates = [a for a in df_unemployment.xs(str(year),axis=1).tolist()]
    physical_scores = [a for a in df_phyisical.xs(str(year),axis=1).tolist()]
    crime_physical_pair += [(a,b) for a,b in zip(crime_rates,physical_scores)]
    unemployment_physical_pair += [(a,b) for a,b in zip(unemployment_rates,physical_scores)]

### 8.3. Physical Test Pass Rate vs Crime Rate

In [None]:
crime_physical_pair_batch_1 = []
crime_physical_pair_batch_2 = []
crime_physical_pair_batch_3 = []

for t in crime_physical_pair:
    if t[0]<=0.1:
        crime_physical_pair_batch_1.append(t[1])
    if t[0]>0.1 and t[0]<=0.2:
        crime_physical_pair_batch_2.append(t[1])
    if t[0]>0.2:
        crime_physical_pair_batch_3.append(t[1])
        
crime_physical_pair_batches = [crime_physical_pair_batch_1,
                               crime_physical_pair_batch_2,
                               crime_physical_pair_batch_3]

In [None]:
plt.xticks(np.arange(1,4),['rate<=10%', '10%<rate<=20%', 'rate>20%'])
plt.ylabel('Physical Test Pass Rate')
plt.xlabel('Crime Rate')
plt.title('Physical Test Pass Rate vs Crime Rate \nBox Plot')
plt.boxplot(crime_physical_pair_batches, vert=True,showmeans=True)
plt.show()

plt.xticks(np.arange(1,4),['rate<=10%', '10%<rate<=20%', 'rate>20%'])
plt.ylabel('Physical Test Pass Rate')
plt.xlabel('Crime Rate')
plt.title('Physical Test Pass Rate vs Crime Rate \nViolin Plot')
plt.violinplot(crime_physical_pair_batches, vert=True,showmeans=True)
plt.show()

### 8.4. Physical Test Pass Rate vs Unemployment Rate

In [None]:
unemployment_physical_pair_batch_1 = []
unemployment_physical_pair_batch_2 = []
unemployment_physical_pair_batch_3 = []
unemployment_physical_pair_batch_4 = []

for t in unemployment_physical_pair:
    if t[0]<=5:
        unemployment_physical_pair_batch_1.append(t[1])
    if t[0]>5 and t[0]<=10:
        unemployment_physical_pair_batch_2.append(t[1])
    if t[0]>10 and t[0]<=15:
        unemployment_physical_pair_batch_3.append(t[1])
    if t[0] > 15:
        unemployment_physical_pair_batch_4.append(t[1])
        
unemployment_physical_pair_batches = [unemployment_physical_pair_batch_1,
                                      unemployment_physical_pair_batch_2,
                                      unemployment_physical_pair_batch_3,
                                      unemployment_physical_pair_batch_4]

In [None]:
plt.xticks(np.arange(1,5),['rate<=5%', '5%<rate<=10%', '10%<rate<=15%', 'rate>15%'])
plt.ylabel('Physical Pass Rate')
plt.xlabel('Unemployment Rate')
plt.title('Physical Pass Rate vs Unemployment Rate \nBox Plot')
plt.boxplot(unemployment_physical_pair_batches, vert=True,showmeans=True)
plt.show()

plt.xticks(np.arange(1,5),['rate<=5%', '5%<rate<=10%', '10%<rate<=15%', 'rate>15%'])
plt.ylabel('Physical Pass Rate')
plt.xlabel('Unemployment Rate')
plt.title('Physical Pass Rate vs Unemployment Rate \nViolin Plot')
plt.violinplot(unemployment_physical_pair_batches, vert=True,showmeans=True)
plt.show()