# External Influences on California High School Performance
ECE 143 Spring 2018 Group Project
Group 5: Kenny Chen, Kevin Lai, Yan Sun

### 1. Desciption of Project




### 2. Description of Public Datasets
##### 2.1. California SAT Scores Dataset:
https://www.cde.ca.gov/ds/sp/ai/

##### 2.2. California Crime Dataset:
* Crime Events Data: https://openjustice.doj.ca.gov/data
* Population Data: https://data.ca.gov/dataset/california-population-projection-county-age-gender-and-ethnicity

##### 2.3. California Unemployment Rate Dataset:
https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/ 
##### 2.4. California Physical Test Dataset:
https://www.cde.ca.gov/ta/tg/pf/pftresearch.asp



# 3. Dependencies

* python3
* pandas == 0.22.0
* numpy == 1.14.1
* pickle
* glob
* zipfile
* imageio == 2.1.2
* matplotlib == 2.2.0
* plotly == 2.7.0
(Sign up for Plotly at https://plot.ly/)

# 4. Preprocess

* Download all the aforementioned datasets.
* Put them in the same path with notebooks for preprocessing in ```./src/data_preprocess/``` then run the preprocessing notebooks.
* Then all the dataframes contain preprocessed data could be generated and saved into pickle files.
* These pickle files are saved in ```./data/```.

# 5. Animation

In this part, plotly library is used for animation. Sign up for Plotly is requried at https://plot.ly/. 

After Signing in, go to the top right corner --- setting --- API Keys to generate an API key.

In [15]:
import pickle
import plotly
import plotly.plotly as py
import plotly.figure_factory as ff

import numpy as np
import pandas as pd

Log in here with username and api_key

In [17]:
# Use this line to login in Plotly
plotly.tools.set_credentials_file(username='yansun1996', api_key='qkwnFoQhO3qdkWGLe2vL')

# Use this line if you want to open offline mode if needed
# plotly.offline.init_notebook_mode()

Read Basic Map Information for Plotly

In [19]:
df_sample = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/minoritymajority.csv')
df_sample_r = df_sample[df_sample['STNAME'] == 'California']

# fips code is the ID for each county
fips = df_sample_r['FIPS'].tolist()

# define color scale for plotly map
colorscale = [
    'rgb(49,54,149)',
    'rgb(69,117,180)',
    'rgb(116,173,209)',
    'rgb(171,217,233)',
    'rgb(224,243,248)',
    'rgb(254,224,144)',
    'rgb(253,174,97)',
    'rgb(244,109,67)',
    'rgb(215,48,39)',
    'rgb(165,0,38)'
]

### 5.1 Static Map for Crime Rate

In [None]:
df_crime = pickle.load(open("./data/crime_rates.pkl","rb"))

for year in range(2000,2015):
    crime_rates = [a * 100 for a in df_crime.xs(year,axis=1).tolist()]

    fig = ff.create_choropleth(
        fips=fips, values=crime_rates, scope=['CA'],
        binning_endpoints=[5, 9, 13, 15, 18, 21, 25, 30], 
        colorscale=colorscale,
        round_legend_values=True,
        county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
        show_state_data=True,
        legend_title='Crime Rate (Percentage)', title='California Crime Rate by Counties in '+str(year)
    )
    
    # Uncomment these two lines for offline Mode
    #filename = 'California_Crime_Rate_Map_Year_'+str(year)
    #plotly.offline.iplot(fig,image="png",filename=filename)
    
    # Uncomment these lines for online mode
    filename = 'California_Crime_Rates_Map_Year_'+str(year)+'.png'
    py.image.save_as(fig,filename)
    # Uncomment this line to use online mode to show image in notebook
    #py.iplot(fig, filename='choropleth_california_and_surr_states_outlines')

### 5.2 Static Map for Unemployment Rate

In [None]:
df_unemployment = pickle.load(open("./data/unemployment_rates.pkl","rb"))

for year in range(2007,2015):
    unemployment_rates = [a for a in df_unemployment.xs(str(year),axis=1).tolist()]

    fig = ff.create_choropleth(
        fips=fips, values=unemployment_rates, scope=['CA'],
        binning_endpoints=[2, 3, 5, 7, 10, 13, 15, 22], 
        colorscale=colorscale,
        round_legend_values=True,
        county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
        show_state_data=True,
        legend_title='Unemployment Rate (Percentage)', title='California Unemployment Rate by Counties in '+str(year)
    )

    # Uncomment these two lines for offline Mode
    #filename = 'California_Unemployment_Rate_Map_Year_'+str(year)
    #plotly.offline.iplot(fig,image="png",filename=filename)
    
    # Uncomment these lines for online mode
    filename = 'Unemployment_Rate_Map_Year_'+str(year)+'.png'
    py.image.save_as(fig,filename)
    # Uncomment this line to use online mode to show image in notebook
    #py.iplot(fig, filename='choropleth_california_and_surr_states_outlines')

### 5.3 Static Map for Physical Test Pass Rate

In [None]:
df_physical = pickle.load(open("./data/physical_fitness_data.pkl","rb")).transpose()

for year in range(2007,2014):
    physical_scores = [a for a in df_physical.xs(str(year),axis=1).tolist()]

    fig = ff.create_choropleth(
        fips=fips, values=physical_scores, scope=['CA'],
        binning_endpoints=[10, 20, 25, 30, 35, 40, 43, 47], 
        colorscale=colorscale,
        round_legend_values=True,
        county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
        show_state_data=True,
        legend_title='Physical Pass Rate (%)', title='California Physical Pass Rate by Counties in '+str(year)
    )
    
    # Uncomment these two lines for offline Mode
    #filename = 'California_Physical_Test_Score_Map_Year_'+str(year)
    #plotly.offline.iplot(fig,image="png",filename=filename)
    
    # Uncomment these lines for online mode
    filename = 'California_Physical_Test_Score_Map_Year_'+str(year)+'.png'
    py.image.save_as(fig,filename)
    # Uncomment this line to use online mode to show image in notebook
    #py.iplot(fig, filename='choropleth_california_and_surr_states_outlines')

### 5.4 Static Map for SAT Scores

In [None]:
df_sat = pickle.load(open("./data/sat_data.pkl","rb"))

for i in range(7,len(df_sat)):
    sat_verbal = df_sat[i].xs(df_sat[i].columns[1],axis=1).tolist()
    sat_math = df_sat[i].xs(df_sat[i].columns[2],axis=1).tolist()
    total = [a+b for a,b in zip(sat_verbal,sat_math)]
    default_none = sum(total)/len(total)
    total.insert(1,default_none)
    if i == 9: total.insert(45,default_none)

    fig = ff.create_choropleth(
        fips=fips, values=total, scope=['CA'],
        binning_endpoints=[880, 910, 940, 970, 1000, 1030, 1060, 1090], 
        colorscale=colorscale,
        round_legend_values=True,
        county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
        show_state_data=True,
        legend_title='Average SAT Score for Verbal + Math',
        title='California SAT Score by Counties in '+str(2007+i-7)
    )
    
    # Uncomment these lines for offline Mode
    #filename = 'California_SAT_Score_Map_Year_'+str(2007+i-8)
    #plotly.offline.iplot(fig,image="png",filename=filename)
    
    # Uncomment these two lines for online mode
    filename = 'California_SAT_Score_Map_Year_'+str(2007+i-7)+'.png'
    py.image.save_as(fig,filename)
    # Uncomment this line to use online mode to show image in notebook
    #py.iplot(fig, filename='choropleth_california_and_surr_states_outlines')

### 5.5 Make gif

In [None]:
import imageio

crime_rate_img_names = [imageio.imread('California_Crime_Rates_Map_Year_'+
                    str(year)+'.png') for year in range(2007,2014)]
unemployment_rate_img_names = [imageio.imread('Unemployment_Rate_Map_Year_'+
                    str(year)+'.png') for year in range(2007,2014)]
physical_scores_img_names = [imageio.imread('California_Physical_Test_Score_Map_Year_'+
                    str(year)+'.png') for year in range(2007,2014)]
sat_scores_img_names = [imageio.imread('California_SAT_Score_Map_Year_'+
                    str(year)+'.png') for year in range(2007,2014)]

In [None]:
imageio.mimsave('crime_rate_animation.gif',crime_rate_img_names,duration=1)
imageio.mimsave('unemployment_rate_animation.gif',unemployment_rate_img_names,duration=1)
imageio.mimsave('physical_pass_rate_animation.gif',physical_scores_img_names,duration=1)
imageio.mimsave('sat_scores_animation.gif',sat_scores_img_names,duration=1)

# 6. Analysis

Plot the trending over years for average data of all counties in California.