# 8. Predicting Exemplary K-8 Chicago Schools

<h4>Overview</h4>
This lab is a friendly competition to make a prediction whether a K-8 Chicago school's IL State Board of Education Summative Designation is designated 'Exemplary' (the 'Non-Exemplary' school categories are "Commendable", "Targeted", and "Comprehensive").

The prediction will be based on the file MiddleSchool.xlsx with the following data:
    
<ul>
<li>Student Enrollment - Black or African American</li>     
<li> Student Enrollment - Hispanic or Latino </li>           
<li> Student Enrollment - Children with Disabilities</li>     
<li> Student Enrollment - Low Income   </li>                  
<li> Total Number of School Days </li>                         
<li> 8th Grade Passing Algebra 1    </li>                     
<li> Student Attendance Rate     </li>                          
<li> Student Chronic Truancy Rate  </li>                       
<li> Avg Class Size – All Grades   </li>                       
<li> Teacher Retention Rate </li> </ul>


<h4>Scoring </h4>
Scoring is based on values for the $Confusion$ $Matrix$:
\begin{pmatrix}
TP & FN \\
FP & TN 
\end{pmatrix}
where
<ul>
    <li> TP=True Positive: your model predicts exemplary and the school is exemplary</li>
    <li> TN=True Negative: your model predicts not exemplary and the school is not exemplary</li>
    <li> FP=False Positive: your model predicts exemplary but the school is not exemplary</li>
    <li> FN=False Negative: your model predicts not exemplary but the school is exemplary</li>
    </ul>
    
 The number of each type of prediction then determines
 
 <ul>
    <li> <b>Accuracy </b> = $\frac{\mid TP\mid + \mid TN \mid}{\mid TP\mid + \mid TN \mid+ \mid FP\mid + \mid FN \mid}$    (proportion that were correctly predicted out of all the schools)</li>
 
 
<li> <b> Specificity (Precision)</b> = $\frac{\mid TN\mid}{\mid TN\mid + \mid FP\mid }$ (proportion that were correct out of those you predicted to be exemplary) </li>
 
<li><b> Sensitivity (Recall) </b> = $\frac{\mid TP\mid}{\mid TP\mid + \mid FN\mid }$ (proportion that you predicted correctly among just the exemplary schools)</li>
 
Your competition (F1) score is the geometric mean of the specificity, and sensitivity. 

<h3>EXAMPLE</h3>

<h4>STEP ONE: Exploratory Data Analysis</h4>
1a) Import the usual libraries including matplotlib.pyplot as plt, as well as the MiddleSchool report card data into a datframe df.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_excel("MiddleSchool.xlsx")
df.head(2)

1b) Display the column names and the number of data rows in each column.

In [2]:
df.count()

1c) Get the max and min values in each column.

In [3]:
n1min=df["# Student Enrollment"].min()
n1max=df["# Student Enrollment"].max()
n2min=df["% Student Enrollment - Black or African American"].min()
n2max=df["% Student Enrollment - Black or African American"].max()
n3min=df["% Student Enrollment - Hispanic or Latino"].min()
n3max=df["% Student Enrollment - Hispanic or Latino"].max()
n4min=df["% Student Enrollment - Children with Disabilities"].min()
n4max=df["% Student Enrollment - Children with Disabilities"].max()
n5min=df["% Student Enrollment - Low Income"].min()
n5max=df["% Student Enrollment - Low Income"].max()
n6min=df["Total Number of School Days"].min()
n6max=df["Total Number of School Days"].max()
n7min=df["% 8th Grade Passing Algebra 1"].min()
n7max=df["% 8th Grade Passing Algebra 1"].max()
n8min=df["Student Attendance Rate"].min()
n8max=df["Student Attendance Rate"].max()
n9min=df["Student Chronic Truancy Rate"].min()
n9max=df["Student Chronic Truancy Rate"].max()
n10min=df["Avg Class Size – All Grades"].min()
n10max=df["Avg Class Size – All Grades"].max()
n11min=df["Teacher Retention Rate"].min()
n11max=df["Teacher Retention Rate"].max()
print("min enroll",n1min)
print("max enroll",n1max)
print("min % Student Enrollment - Black or African American",n2min)
print("max % Student Enrollment - Black or African American",n2max)
print("min % Student Enrollment - Hispanic or Latino",n3min)
print("max % Student Enrollment - Hispanic or Latino",n3max)
print("min % Student Enrollment - Children with Disabilities",n4min)
print("max % Student Enrollment - Children with Disabilities",n4max)
print("min % Student Enrollment - Low Income",n5min)
print("max % Student Enrollment - Low Income",n5max)
print("min Total Number of School Days",n6min)
print("max Total Number of School Days",n6max)
print("min % 8th Grade Passing Algebra 1",n7min)
print("max % 8th Grade Passing Algebra 1",n7max)
print("min Student Attendance Rate",n8min)
print("max Student Attendance Rate",n8max)
print("min Student Chronic Truancy Rate",n9min)
print("max Student Chronic Truancy Rate",n9max)
print("Avg Class Size – All Grades",n10min)
print("Avg Class Size – All Grades",n10max)
print("Teacher Retention Rate",n11min)
print("Teacher Retention Rate",n11max)


1d) Check how many schools are in each category.

In [4]:
df["Summative Designation"].value_counts()

<h4> STEP TWO</h4>
Define a function which predicts whether a school is exemplary (1) or not-exemplary (0).

2a) For a simple prediction, let us predict that a school is exemplary if the Teacher Retention Rate is at least 90%.

In [5]:
#---PREDICTION MODEL----#
def mypredict(df):
    for i in df.index:
        if df.loc[i,"Teacher Retention Rate"]>90:
            df.loc[i,"Prediction"]=1
        else:
            df.loc[i,"Prediction"]=0
    return df

#---APPLY MODEL TO OUR DATA---#
mydf=mypredict(df)
mydf=mydf.reset_index(drop=True)

#---COMPUTE YOUR SCORE---#
TP=0
TN=0
FP=0
FN=0
numschools=0
for i in mydf.index:
    if mydf.loc[i,"Prediction"]==1 and mydf.loc[i,"Summative Designation"]=="Exemplary":
        TP=TP+1
    if mydf.loc[i,"Prediction"]==0 and mydf.loc[i,"Summative Designation"]!="Exemplary":
        TN=TN+1
    if mydf.loc[i,"Prediction"]==1 and mydf.loc[i,"Summative Designation"]!="Exemplary":
        FP=FP+1
    if mydf.loc[i,"Prediction"]==0 and mydf.loc[i,"Summative Designation"]=="Exemplary":
        FN=FN+1
    numschools=numschools+1
print("|TP|=",TP)
print("|TN|=",TN)
print("|FP|=",FP)
print("|FN|=",FN)
accuracy=round((TP+TN)/numschools,2)
precision=round(TP/(TP+FP),2)
recall=round(TP/(TP+FN),2)
F1score=2*(precision*recall)/(precision+recall)
print("Accuracy (% correct all 122 schools)=",100*accuracy,"%")
print("Precision (% correct of those you predicted to be exemplary) =",100*precision,"%")
print("Recall (% correct of schools that are exemplary) =",100*recall,"%")
print('COMPETITION F1 SCORE=',round(F1score*100,2),"%" )
    

#### ASSIGNMENT

Modify the Prediction Model to see how high you can score.