# Campus ratings

## A scripted lesson

This notebook takes my [Excel pivot table lesson](https://docs.google.com/document/d/1PRM1ozgbqkq69ZwpRue1ttho-FCHeKKR7Thybz6AAak/edit#heading=h.h6x8isam3qkn) and scripts it using agate.

## About the data

The Texas Education Agency rates public schools based on test scores and other factors. This lesson is based on the 2017 ratings released August 15, 2017. The file we are using, 2017-school-ratings.csv, is a version of the [downloadable data](https://rptsvr1.tea.texas.gov/perfreport/account/2017/download.html), but some processing has been done to cut down and rename the columns we need.

(This data set is actually a bit contrived for the purposes of learning a couple of skills, so perhaps we can come back later and add the preprocessing to this script.)


## Goal

We want to find a number of things from this data:
- What percentage of charter schools received a "Needs Improvement" rating, compared to traditional public schools.
- Which schools in Austin ISD received a "Needs Improvement" rating?
- Which schools in Region 13 received a "Needs Improvement" rating?


In [1]:
import agate

In [2]:
# set column types
specified_types = {
      'District_ID': agate.Text(),
      'Campus_ID': agate.Text(),
      'REGION': agate.Text(),    
  }

# create table named raw from csv
raw = agate.Table.from_csv('../data/2017-school-ratings.csv', column_types=specified_types)

In [3]:
# print the column names and types
print(raw)

| column      | data_type |
| ----------- | --------- |
| DISTNAME    | Text      |
| CAMPNAME    | Text      |
| District_ID | Text      |
| Campus_ID   | Text      |
| REGION      | Text      |
| CFLALTED    | Boolean   |
| C_RATING    | Text      |
| CI1         | Number    |
| CI1_CUT     | Number    |
| CI1_MET     | Text      |
| CI2         | Number    |
| CI2_CUT     | Number    |
| CI2_MET     | Text      |
| CI3         | Number    |
| CI3_CUT     | Number    |
| CI3_MET     | Text      |
| CI4         | Number    |
| CI4_CUT     | Number    |
| CI4_MET     | Text      |



In [4]:
# looking at the first 5 records in the table
raw.limit(5).print_table()

| DISTNAME    | CAMPNAME       | District_ID | Campus_ID | REGION | CFLALTED | ... |
| ----------- | -------------- | ----------- | --------- | ------ | -------- | --- |
| CAYUGA ISD  | CAYUGA H S     | 001902      | 001902001 | 7      |    False | ... |
| CAYUGA ISD  | CAYUGA MIDDLE  | 001902      | 001902041 | 7      |    False | ... |
| CAYUGA ISD  | CAYUGA EL      | 001902      | 001902103 | 7      |    False | ... |
| ELKHART ISD | ELKHART H S    | 001903      | 001903001 | 7      |    False | ... |
| ELKHART ISD | ELKHART MIDDLE | 001903      | 001903041 | 7      |    False | ... |


## Creating a column for charter status

In this data set, we have to use part of the `Campus_ID` field to figure out if a campus is a charter school or not. We look at the fourth character and evaluate if it is an '8', which is a charter school, or a '9', which means it is not. Python starts counting at 0, so that is why we use [3] as our value. 

In [5]:
# showing I can get charter schools
# .where is the method. It returns whatever is true
# we are feeding .where a test:
#  For each row, look at 'Campus_ID' at the 4th position and if it is 8, it is true.
#  If not 8, then it skips it.
raw.where(lambda row: row['Campus_ID'][3] == '8').limit(5).print_table()

| DISTNAME             | CAMPNAME             | District_ID | Campus_ID | REGION | CFLALTED | ... |
| -------------------- | -------------------- | ----------- | --------- | ------ | -------- | --- |
| PINEYWOODS COMMUN... | PINEYWOODS COMMUN... | 003801      | 003801001 | 7      |    False | ... |
| PINEYWOODS COMMUN... | DR TERRY ROBBINS ... | 003801      | 003801042 | 7      |    False | ... |
| PINEYWOODS COMMUN... | SARAH STRINDEN EL    | 003801      | 003801103 | 7      |    False | ... |
| ST MARY'S ACADEMY... | ST MARY'S ACADEMY... | 013801      | 013801101 | 2      |    False | ... |
| RICHARD MILBURN A... | RICHARD MILBURN A... | 014801      | 014801001 | 20     |     True | ... |


In [6]:
# this is a function for the .compute method below
# it evaluates if the value sent it is '8', and if
# so, then it returns 'Charter'. Of not, then 'Not charter'.
def set_charter_column(value):
    if value == '8':
        return 'Charter'
    else:
        return 'Not charter'

# We are creating a new column called 'charter'. To get the value to insert
# for each row, we are feeding the 4th position of the 'Campus_ID' column
# to the set_charter function, which is telling us what to put in, either
# 'Charter' or 'Not charter'
# We put this all into a new table 
charter_set = raw.compute([
  ('Charter', # the name of the new column
   agate.Formula(agate.Text(),
   lambda r: set_charter_column(r['Campus_ID'][3]))
  )
])

In [7]:
# peek at charter records
charter_set.select([
        'Campus_ID',
        'Charter'
    ]).where(lambda row: row['Campus_ID'][3] == '8').limit(5).print_table()

| Campus_ID | Charter |
| --------- | ------- |
| 003801001 | Charter |
| 003801042 | Charter |
| 003801103 | Charter |
| 013801101 | Charter |
| 014801001 | Charter |


In [8]:
# peek at non-charter records
charter_set.select([
        'Campus_ID',
        'Charter'
    ]).where(lambda row: row['Campus_ID'][3] != '8').limit(5).print_table()

| Campus_ID | Charter     |
| --------- | ----------- |
| 001902001 | Not charter |
| 001902041 | Not charter |
| 001902103 | Not charter |
| 001903001 | Not charter |
| 001903041 | Not charter |


## Create column of explained ratings

In [9]:
# These are the values for the rating.
# C_RATING is on the left, the definition is on the right
# M=Met Standard, A=Met Alternative Standard, I=Improvement Required, X/Z=Not Rated, T=Not Rated: Annexation
rating_values = {
    'I': 'Improvement required',
    'M': 'Met standard',
    'A': 'Met alternative standard',
    'X': 'Not rated',
    'Z': 'Not rated',
    'T': 'Not rated',
    '': 'Not rated',
}

def map_rating(rating):
    rating = rating.strip()
    return rating_values[rating]

rating_set = charter_set.compute([
  ('Rating',
   agate.Formula(agate.Text(),
   lambda r: map_rating(r['C_RATING']))
  )
])

# Filter out campuses we don't want

We don't want to consider campuses that use the alternative standard, so let's filter those out.

In [10]:
rating_filtered = rating_set.where(
    lambda row: row['Rating'] in ('Met standard', 'Improvement required')
)

print('number of all campuses: {}'.format(len(rating_set)))
print('number after filtering: {}'.format(len(rating_filtered)))
print('distinct values in Ratings now: {}'.format(
        rating_filtered.columns['Rating'].values_distinct()
    ))

number of all campuses: 8757
number after filtering: 7951
distinct values in Ratings now: ('Met standard', 'Improvement required')


In [11]:
# filter where CFLALTED is not true
alts_removed = rating_filtered.where(
    lambda row: row['CFLALTED'] is False
)
print('Number before filter: {}'.format(len(rating_filtered)))
print('Number after filter: {}'.format(len(alts_removed)))

Number before filter: 7951
Number after filter: 7932


## Analysis time

Now that we have our data in all filtered, we'll set our new table and start working with it.

In [12]:
# our final table for analysis
campus = alts_removed

In [13]:
# pivot the table based charter and rating to see the number of records
campus_pivot = campus.pivot('Charter', 'Rating')
campus_pivot.print_table()

| Charter     | Met standard | Improvement required |
| ----------- | ------------ | -------------------- |
| Not charter |        7,138 |                  310 |
| Charter     |          442 |                   42 |


In [14]:
# function to create fail rate: part / total * 100
def pass_rate(row):
    return ((row['Improvement required'] / (row['Met standard'] + row['Improvement required'])) *100)

# Create new column with fail rate
campus_charter_rate = campus_pivot.compute([
    ('Fail rate', agate.Formula(agate.Number(), pass_rate))
])

# print the new table
campus_charter_rate.print_table()

| Charter     | Met standard | Improvement required | Fail rate |
| ----------- | ------------ | -------------------- | --------- |
| Not charter |        7,138 |                  310 |    4.162… |
| Charter     |          442 |                   42 |    8.678… |


## Next: Filter to see Austin schools that failed

In [15]:
# Get just the austin schools
austin = campus.where(lambda row: row['DISTNAME'] == 'AUSTIN ISD')
print(len(austin))

117


In [16]:
austin_pivot = austin.pivot('Charter', 'Rating')
austin_pivot.print_table()

| Charter     | Met standard | Improvement required |
| ----------- | ------------ | -------------------- |
| Not charter |          113 |                    4 |


In [17]:
austin_failed = austin.where(lambda row: row['Rating'] == 'Improvement required')

columns_list = [
    'CAMPNAME',
    'Rating',
    'CI1_MET',
    'CI2_MET',
    'CI3_MET',
    'CI4_MET',
]

austin_failed.select(columns_list).print_table()

| CAMPNAME      | Rating               | CI1_MET | CI2_MET | CI3_MET | CI4_MET |
| ------------- | -------------------- | ------- | ------- | ------- | ------- |
| BURNET M S    | Improvement required | N       | N       | N       | Y       |
| MARTIN MIDDLE | Improvement required | N       | N       | Y       | Y       |
| MENDEZ M S    | Improvement required | N       | N       | N       | N       |
| GOVALLE EL    | Improvement required | N       | Y       | N       | Y       |


## Region 13 failings

In [18]:
region = campus.where(lambda row: row['REGION'] == '13')
print(len(region))

532


In [19]:
region_pivot = region.pivot('Charter', 'Rating')
region_pivot.print_table()

| Charter     | Met standard | Improvement required |
| ----------- | ------------ | -------------------- |
| Not charter |          475 |                   20 |
| Charter     |           35 |                    2 |


In [20]:
region_failed = region.where(lambda row: row['Rating'] == 'Improvement required')

region_columns_list = [
    'DISTNAME',
    'CAMPNAME',
    'Charter',
    'CI1_MET',
    'CI2_MET',
    'CI3_MET',
    'CI4_MET',
]

region_sorted = region_failed.select(region_columns_list).order_by(lambda row: (row['DISTNAME'], row['CAMPNAME']))

region_sorted.print_table(max_rows=None, max_column_width=50)

| DISTNAME                 | CAMPNAME                          | Charter     | CI1_MET | CI2_MET | CI3_MET | ... |
| ------------------------ | --------------------------------- | ----------- | ------- | ------- | ------- | --- |
| AUSTIN ISD               | BURNET M S                        | Not charter | N       | N       | N       | ... |
| AUSTIN ISD               | GOVALLE EL                        | Not charter | N       | Y       | N       | ... |
| AUSTIN ISD               | MARTIN MIDDLE                     | Not charter | N       | N       | Y       | ... |
| AUSTIN ISD               | MENDEZ M S                        | Not charter | N       | N       | N       | ... |
| BARTLETT ISD             | BARTLETT SCHOOLS                  | Not charter | N       | Y       | N       | ... |
| DIME BOX ISD             | DIME BOX SCHOOL                   | Not charter | N       | Y       | N       | ... |
| ELGIN ISD                | BOOKER T WASHINGTON EL            | Not charter | N