# Campus ratings

## A scripted lesson

This notebook takes my [Excel pivot table lesson](https://docs.google.com/document/d/1PRM1ozgbqkq69ZwpRue1ttho-FCHeKKR7Thybz6AAak/edit#heading=h.h6x8isam3qkn) and scripts it using agate. THIS VERSION USES EXACTLY THE SAME DATA AS THE EXCEL LESSON.

## About the data

The Texas Education Agency rates public schools based on test scores and other factors. This lesson is based on the 2017 ratings released August 15, 2017. THE DATA FILE HAS BEEN CUT DOWN TO MATCH THE EXCEL CLASS DATA.

This file has been stored in the `/data/` folder.

## Goal

We want to find a number of things from this data:
- What percentage of charter schools received a "Needs Improvement" rating, compared to traditional public schools.
- Which schools in Austin ISD received a "Needs Improvement" rating?
- Which schools in Region 13 received a "Needs Improvement" rating?


In [1]:
import agate
import warnings
warnings.filterwarnings('ignore')

In [2]:
# this list of types was build from the field definitions referenced above
specified_type = {
  'DISTNAME': agate.Text(),
  'CAMPNAME': agate.Text(),
  'District_ID': agate.Text(),
  'Campus_ID': agate.Text(),
  'REGION': agate.Text(),
  'CFLALTED': agate.Text(),
  'C_RATING': agate.Text(),
  'CI1': agate.Number(),
  'CI1_CUT': agate.Number(),
  'CI1_MET': agate.Text(),
  'CI2': agate.Number(),
  'CI2_CUT': agate.Number(),
  'CI2_MET': agate.Text(),
  'CI3': agate.Number(),
  'CI3_CUT': agate.Number(),
  'CI3_MET': agate.Text(),
  'CI4': agate.Number(),
  'CI4_CUT': agate.Number(),
  'CI4_MET': agate.Text(),
}

# import the raw data
raw = agate.Table.from_csv('../data/2017-school-ratings.csv',column_types=specified_type)


In [3]:
# print columns
print(raw)

| column      | data_type |
| ----------- | --------- |
| DISTNAME    | Text      |
| CAMPNAME    | Text      |
| District_ID | Text      |
| Campus_ID   | Text      |
| REGION      | Text      |
| CFLALTED    | Text      |
| C_RATING    | Text      |
| CI1         | Number    |
| CI1_CUT     | Number    |
| CI1_MET     | Text      |
| CI2         | Number    |
| CI2_CUT     | Number    |
| CI2_MET     | Text      |
| CI3         | Number    |
| CI3_CUT     | Number    |
| CI3_MET     | Text      |
| CI4         | Number    |
| CI4_CUT     | Number    |
| CI4_MET     | Text      |



THIS IS WHERE I STOPPED. NEED TO LOOK AT POSITION TO GET CHARTER STATUS

## Creating a column for charter status

In this data set, we have to use part of the `Campus_ID` field to figure out if a campus is a charter school or not. We look at the fourth character and evaluate if it is an '8', which is a charter school, or a '9', which means it is not. Python starts counting at 0, so that is why we use [3] as our value. 

In [4]:
# this is a function for the .compute method below
# it evaluates if the value sent it is '8', and if
# so, then it returns 'Charter'. Of not, then 'Not charter'.
def set_charter_column(value):
    if value == '8':
        return 'Charter'
    else:
        return 'Not charter'

# We are creating a new column called 'charter'. To get the value to insert
# for each row, we are feeding the 4th position of the 'Campus_ID' column
# to the set_charter function, which is telling us what to put in, either
# 'Charter' or 'Not charter'
# We put this all into a new table 
charter_set = raw.compute([
  ('CHARTER', # the name of the new column
   agate.Formula(agate.Text(),
   lambda r: set_charter_column(r['Campus_ID'][3]))
  )
])

In [5]:
# peek at charter records
charter_set.where(lambda row: row['CHARTER'] == 'Charter').select([
        'CAMPNAME',
        'CHARTER'
    ]).limit(5).print_table(max_column_width=None)

| CAMPNAME                           | CHARTER |
| ---------------------------------- | ------- |
| PINEYWOODS COMMUNITY ACADEMY H S   | Charter |
| DR TERRY ROBBINS MIDDLE            | Charter |
| SARAH STRINDEN EL                  | Charter |
| ST MARY'S ACADEMY CHARTER SCHOOL   | Charter |
| RICHARD MILBURN ALTER H S (KILLEEN | Charter |


In [6]:
# peek at non-charter records
charter_set.where(lambda row: row['CHARTER'] == 'Not charter').select([
        'CAMPNAME',
        'CHARTER'
    ]).limit(5).print_table(max_column_width=None)

| CAMPNAME       | CHARTER     |
| -------------- | ----------- |
| CAYUGA H S     | Not charter |
| CAYUGA MIDDLE  | Not charter |
| CAYUGA EL      | Not charter |
| ELKHART H S    | Not charter |
| ELKHART MIDDLE | Not charter |


## Create column of explained ratings

In [7]:
# These are the values for the rating.
# C_RATING is on the left, the definition is on the right
# M=Met Standard, A=Met Alternative Standard, I=Improvement Required, X/Z=Not Rated, T=Not Rated: Annexation
rating_values = {
    'I': 'Improvement required',
    'M': 'Met standard',
    'A': 'Met alternative standard',
    'X': 'Not rated',
    'Z': 'Not rated',
    'T': 'Not rated',
    '': 'Not rated',
}

def map_rating(rating):
    rating = rating.strip()
    return rating_values[rating]

rating_set = charter_set.compute([
  ('RATING',
   agate.Formula(agate.Text(),
   lambda r: map_rating(r['C_RATING']))
  )
])

# Filter out campuses we don't want

We don't want to consider campuses that use the alternative standard, so let's filter those out.

In [8]:
rating_filtered = rating_set.where(
    lambda row: row['RATING'] in ('Met standard', 'Improvement required')
)

print('number of all campuses: {}'.format(len(rating_set)))
print('number after filtering: {}'.format(len(rating_filtered)))
print('distinct values in Ratings now: {}'.format(
        rating_filtered.columns['RATING'].values_distinct()
    ))

number of all campuses: 8757
number after filtering: 7951
distinct values in Ratings now: ('Improvement required', 'Met standard')


In [9]:
# filter where CFLALTED is not true
alts_removed = rating_filtered.where(
    lambda row: row['CFLALTED'] == 'False'
)
print('Number before filter: {}'.format(len(rating_filtered)))
print('Number after filter: {}'.format(len(alts_removed)))

Number before filter: 7951
Number after filter: 7932


In [10]:
print(alts_removed)

| column      | data_type |
| ----------- | --------- |
| DISTNAME    | Text      |
| CAMPNAME    | Text      |
| District_ID | Text      |
| Campus_ID   | Text      |
| REGION      | Text      |
| CFLALTED    | Text      |
| C_RATING    | Text      |
| CI1         | Number    |
| CI1_CUT     | Number    |
| CI1_MET     | Text      |
| CI2         | Number    |
| CI2_CUT     | Number    |
| CI2_MET     | Text      |
| CI3         | Number    |
| CI3_CUT     | Number    |
| CI3_MET     | Text      |
| CI4         | Number    |
| CI4_CUT     | Number    |
| CI4_MET     | Text      |
| CHARTER     | Text      |
| RATING      | Text      |



## Analysis time

Now that we have our data in all filtered, we'll set our new table and start working with it.

In [11]:
# our final table for analysis
# campus_columns = [
#     'CAMPUS',
#     'CAMPNAME',
#     'GRDTYPE',
#     'DISTNAME',
#     'DISTRICT',
#     'CNTYNAME',
#     'REGNNAME',
#     'REGION',
#     'CHARTER',
#     'RATING',
#     'C_YRS_IR',
#     'CFLCHART',
#     'CI1_MET',
#     'CI2_MET',
#     'CI3_MET',
#     'CI4_MET',
#     'CPETECHP',
#     'CPETECOP',
#     'CPETLEPP',
#     'CPETSPEP',
#     'CPETALLC',
# ]

campus = alts_removed

In [12]:
# pivot the table based charter and rating to see the number of records
campus_pivot = campus.pivot('CHARTER', 'RATING')
campus_pivot.print_table()

| CHARTER     | Met standard | Improvement required |
| ----------- | ------------ | -------------------- |
| Not charter |        7,138 |                  310 |
| Charter     |          442 |                   42 |


In [13]:
# function to create fail rate: part / total * 100
def pass_rate(row):
    return ((row['Improvement required'] / (row['Met standard'] + row['Improvement required'])) *100)

In [14]:
# Create new column with fail rate
campus_charter_rate = campus_pivot.compute([
    ('Fail rate', agate.Formula(agate.Number(), pass_rate))
])

# print the new table
campus_charter_rate.print_table()

| CHARTER     | Met standard | Improvement required | Fail rate |
| ----------- | ------------ | -------------------- | --------- |
| Not charter |        7,138 |                  310 |    4.162… |
| Charter     |          442 |                   42 |    8.678… |


## Next: Filter to see Austin ISD schools that failed

In [15]:
# Get just the austin schools
austin = campus.where(lambda row: row['DISTNAME'] == 'AUSTIN ISD')
print(len(austin))

117


In [16]:
# Show the number of schools
austin_pivot = austin.pivot('CHARTER', 'RATING')
austin_pivot.print_table()

| CHARTER     | Met standard | Improvement required |
| ----------- | ------------ | -------------------- |
| Not charter |          113 |                    4 |


In [17]:
# filter the austin list to failed schools
austin_failed = austin.where(lambda row: row['RATING'] == 'Improvement required')

# columns for fail print list
columns_fail_list = [
    'CAMPNAME',
    'DISTNAME',
    'CI1_MET',
    'CI2_MET',
    'CI3_MET',
    'CI4_MET',
]

# print the list of schools
austin_failed.select(columns_fail_list).print_table(max_columns=None)

| CAMPNAME      | DISTNAME   | CI1_MET | CI2_MET | CI3_MET | CI4_MET |
| ------------- | ---------- | ------- | ------- | ------- | ------- |
| BURNET M S    | AUSTIN ISD | N       | N       | N       | Y       |
| MARTIN MIDDLE | AUSTIN ISD | N       | N       | Y       | Y       |
| MENDEZ M S    | AUSTIN ISD | N       | N       | N       | N       |
| GOVALLE EL    | AUSTIN ISD | N       | Y       | N       | Y       |


## Region 13 failings

Region 13 is the Central Texas schools

In [18]:
region = campus.where(lambda row: row['REGION'] == '13')
print(len(region))

532


In [19]:
region_pivot = region.pivot('CHARTER', 'RATING')
region_pivot.print_table()

| CHARTER     | Met standard | Improvement required |
| ----------- | ------------ | -------------------- |
| Not charter |          475 |                   20 |
| Charter     |           35 |                    2 |


In [20]:
# filter to failed schools
region_failed = region.where(lambda row: row['RATING'] == 'Improvement required')

# sort the list by district, campus
region_sorted = region_failed.select(columns_fail_list).order_by(lambda row: (row['DISTNAME'], row['CAMPNAME']))

# print the list
region_sorted.print_table(max_rows=None, max_columns=None, max_column_width=25)

| CAMPNAME                  | DISTNAME                 | CI1_MET | CI2_MET | CI3_MET | CI4_MET |
| ------------------------- | ------------------------ | ------- | ------- | ------- | ------- |
| BURNET M S                | AUSTIN ISD               | N       | N       | N       | Y       |
| GOVALLE EL                | AUSTIN ISD               | N       | Y       | N       | Y       |
| MARTIN MIDDLE             | AUSTIN ISD               | N       | N       | Y       | Y       |
| MENDEZ M S                | AUSTIN ISD               | N       | N       | N       | N       |
| BARTLETT SCHOOLS          | BARTLETT ISD             | N       | Y       | N       | N       |
| DIME BOX SCHOOL           | DIME BOX ISD             | N       | Y       | N       | Y       |
| BOOKER T WASHINGTON EL    | ELGIN ISD                | N       | Y       | N       | Y       |
| ELGIN EL                  | ELGIN ISD                | N       | Y       | N       | Y       |
| ANNIE PURL EL             | 