# Pivot data: Texas Ethics Commisison data

This [data set](https://drive.google.com/file/d/0B8ConnGcXrv8SmlQQzNVOEtmRGc/view) is of the amount of money  raised for several years by Texas Legislature candidates, as provided by the Texas Ethics Commission. The data set was later annotated by a reporter with the race and party of each candidate, and totals for candidates running for statewide office were removed as their totals would skew simple statistics.

Your goal is to get some basic totals information, and also to figure out if minority Legislators "earn" contributions smaller than their share of the legislature as a whole? Conversely, do anglo candidates earn more that their share of the pie?

You'll upload the resulting spreadsheet with a worksheet for set of questions.  Write the actual answers in cells to the side of the pivot table where applicable.

These are the directions for Excel:

- Q1.1: Who made the most overall?
    - Pivot by candidate, amount. Sort by amount descending.
- Q1.2: Who made the most in the most recent election cycle?
    - Add election cycle to filter and set.
- Q2: What was the average amount raised each election cycle by Democrats and Republicans?
    - Pivot by Party and average(amount).
    - Make a bar chart of the results.
- Q3: What was the average amount raised overall by race?
    - Pivot by race, average(amount).
- Q4: Of the totals raised, what percentage was raised by race?
    - Pivot by race, sum(amount).
    - Copy the main table results and paste values elsewhere on sheet.
    - Create column for Percentage and create formulas for the math.
    - Make a pie chart from the results.
- Q5: Of the total candidates, what is their percentage of the total by race? (We can't use the "count" feature here, so we'll count a bit more manually. Bonus points to student who can explain why we can't use "count" of candidates?)
    - Pivot by candidate. Filter by race.
    - Next to the pivot table, create your own table with columns for Race and Count.
    - Filter by race, then select all the names and note the "Count:" at the bottom right of the screen. Add that number to your table.
    - Create a column for PrcOfTotal and create your formulas.

How does the percentage given by race compare to the makeup of the legislature as a whole?



In [4]:
import agate
import agateexcel
import warnings
warnings.filterwarnings('ignore')

In [5]:
# sets Year as text instead of number
specified_types = {
    'Year': agate.Text(),
}

# import the data
raw = agate.Table.from_xlsx('../data/tec-totals-cleaned.xlsx', column_types=specified_types)

# print the columns
print(raw)

| column         | data_type |
| -------------- | --------- |
| Name           | Text      |
| Candidate      | Text      |
| Party          | Text      |
| Year           | Text      |
| Election cycle | Text      |
| Amount         | Number    |
| Race-Ethnicity | Text      |



In [6]:
# exclude original name column. we just need the cleaned one

raw_selected = raw.exclude('Name')
print(raw_selected)

| column         | data_type |
| -------------- | --------- |
| Candidate      | Text      |
| Party          | Text      |
| Year           | Text      |
| Election cycle | Text      |
| Amount         | Number    |
| Race-Ethnicity | Text      |



In [7]:
# as a matter of habit, I do all my filtering,
# then reset the table to the last filter
# this way I can adjust filters above without affecting everything below
tec = raw_selected

In [8]:
# print top of table to see what it looks like
tec.limit(5).print_table()

| Candidate    | Party | Year | Election cycle |     Amount | Race-Ethnicity |
| ------------ | ----- | ---- | -------------- | ---------- | -------------- |
| Aaron Pena   | D     | 2009 | 2009-2010      |  35,543.13 | Hispanic       |
| Aaron Pena   | R     | 2010 | 2009-2010      |  70,550.00 | Hispanic       |
| Aaron Pena   | R     | 2011 | 2011-2012      |  58,893.18 | Hispanic       |
| Abel Herrero | D     | 2009 | 2009-2010      | 167,189.73 | Hispanic       |
| Abel Herrero | D     | 2012 | 2011-2012      | 538,533.00 | Hispanic       |


## Q1.1 Who made the most overall?
- Group by candidate
- create total raised column with an aggregation
- order the new table by total raised and print the top records

In [9]:
# group the data by candidate
tec_candidates = tec.group_by('Candidate')

# aggregate the Amount column
candidate_totals = tec_candidates.aggregate([
    ('Total raised', agate.Sum('Amount')),
])

# print the top of the table, reverse order of Total raised
candidate_totals.order_by('Total raised', reverse=True).limit(5).print_table()

| Candidate          | Total raised |
| ------------------ | ------------ |
| Todd A. Hunter     | 3,407,905.19 |
| John Whitmire      | 3,407,483.58 |
| John Carona        | 3,346,511.95 |
| Juan Chuy Hinojosa | 3,212,715.51 |
| Kirk Watson        | 3,134,486.07 |


## Q.2 Who made the most in most recent cycle
- view the distinct cycle values
- group by both candidate and election cycle
- create aggregate of total raised
- sort by total raised
- filter to most recent cycle
- print the top record

In [10]:
# view the distinct election cycles
tec.columns['Election cycle'].values_distinct()

('2009-2010', '2013-2014', '2011-2012')

In [11]:
# group by candidate and election cycle
tec_cand_cycle = tec.group_by('Candidate').group_by('Election cycle')

# create sum of Amount
tec_cand_cycle_totals = tec_cand_cycle.aggregate([
        ('Total raised', agate.Sum('Amount'))
    ])

# sort the resulting table based on Total raised
tec_cand_cycle_total_sort = tec_cand_cycle_totals.order_by('Total raised', reverse=True)

# print the result
tec_cand_cycle_total_sort.limit(5).print_table()

| Candidate         | Election cycle | Total raised |
| ----------------- | -------------- | ------------ |
| Konni Burton      | 2013-2014      | 2,742,737.50 |
| Brandon Creighton | 2013-2014      | 1,983,895.78 |
| Larry Taylor      | 2011-2012      | 1,932,719.22 |
| Sylvia Garcia     | 2013-2014      | 1,831,889.32 |
| Kelly Hancock     | 2011-2012      | 1,712,407.49 |


In [12]:
# while we can see we have our most recent value already,
# we will filter for it anyway to get our single answer

# filter our sorted list for '2013-2014'
tec_sorted_latest = tec_cand_cycle_total_sort.where(lambda row: row['Election cycle'] == '2013-2014')

tec_sorted_latest.limit(1).print_table()

| Candidate    | Election cycle | Total raised |
| ------------ | -------------- | ------------ |
| Konni Burton | 2013-2014      |  2,742,737.5 |


## Q2: What was the average amount raised each election cycle by Democrats and Republicans?
- Group by party and election cycle
- Create aggegate of average
- Make a chart of the results?

In [18]:
tec_party_cycle = tec.group_by('Party').group_by('Election cycle')

tec_party_totals = tec_party_cycle.aggregate([
        ('Total raised', agate.Mean('Amount'))
    ])

tec_party_totals.print_table()

| Party | Election cycle | Total raised |
| ----- | -------------- | ------------ |
| D     | 2009-2010      | 155,672.840… |
| D     | 2011-2012      | 185,343.738… |
| D     | 2013-2014      | 176,097.873… |
| R     | 2009-2010      | 215,121.064… |
| R     | 2011-2012      | 213,314.860… |
| R     | 2013-2014      | 231,270.693… |


In [26]:
# rounding is more complicated than it should be in agate
# We have to create a rounding function and then compute a new column with it

# we need this standard python module
from decimal import Decimal

# we define our function
def round_column(row):
    return row['Total raised'].quantize(Decimal('0'))

# create a new table with the new column
tec_party_totals_rnd = tec_party_totals.compute([
    ('Total', agate.Formula(agate.Number(), round_column))
])

# print table to view it
tec_party_totals_rnd.print_table()

# select columns we want into a clean table
tec_party_totals_data = tec_party_totals_rnd.select(['Party', 'Election cycle', 'Total'])
tec_party_totals_data.print_table()

| Party | Election cycle | Total raised |   Total |
| ----- | -------------- | ------------ | ------- |
| D     | 2009-2010      | 155,672.840… | 155,673 |
| D     | 2011-2012      | 185,343.738… | 185,344 |
| D     | 2013-2014      | 176,097.873… | 176,098 |
| R     | 2009-2010      | 215,121.064… | 215,121 |
| R     | 2011-2012      | 213,314.860… | 213,315 |
| R     | 2013-2014      | 231,270.693… | 231,271 |
| Party | Election cycle |   Total |
| ----- | -------------- | ------- |
| D     | 2009-2010      | 155,673 |
| D     | 2011-2012      | 185,344 |
| D     | 2013-2014      | 176,098 |
| R     | 2009-2010      | 215,121 |
| R     | 2011-2012      | 213,315 |
| R     | 2013-2014      | 231,271 |


### I'm stuck on the chart

Agate's leather does not have a multiple column chart that I can find.

I'm having trouble with matplotlib