# 2010 census pops from csv

This version starts with a csv file `/data/DEC_10_SF1_TX_County_population.csv` that is used for the same assignment using Excel.

The data set is 2010 population by Hispanic ethnicity and race for all Texas counties, but it is only select columns from the data set.  Hidalgo County was fixed to remove the annotation on total populaton.

## The quest
- Calculate the percentage of population share for each race for each county
- Find which county had the highest share of population for each of the ethnicities/races.


In [1]:
import agate
from decimal import Decimal
import warnings
warnings.filterwarnings('ignore')

In [2]:
column_names = ['Id', 'Id2', 'Geography',
                'Total pop', 'Hispanic', 'Not Hispanic', 'White', 'Black',
                'Indian', 'Asian', 'Hawaiian', 'Other', 'Two or more']
column_types = [agate.Text(), agate.Text(), agate.Text(), 
                agate.Number(), agate.Number(), agate.Number(), agate.Number(), agate.Number(),
                agate.Number(), agate.Number(), agate.Number(), agate.Number(), agate.Number(), ]

raw = agate.Table.from_csv(
    '../data/DEC_10_SF1_TX_County_population.csv',
    column_names,
    column_types,
    skip_lines=2)

In [3]:
print(raw)

| column       | data_type |
| ------------ | --------- |
| Id           | Text      |
| Id2          | Text      |
| Geography    | Text      |
| Total pop    | Number    |
| Hispanic     | Number    |
| Not Hispanic | Number    |
| White        | Number    |
| Black        | Number    |
| Indian       | Number    |
| Asian        | Number    |
| Hawaiian     | Number    |
| Other        | Number    |
| Two or more  | Number    |



In [7]:
# this is used by the .compute method below
# It does the math for the percentage
# then rounds it with .quantize
def make_percentage(row):
    return lambda r: ((r[row] / r['Total pop']) * 100).quantize(Decimal('0.01'))


In [8]:
# this creates a new column of percentage of race
percentages = raw.compute([
    ('Hispanic %', agate.Formula(agate.Number(), make_percentage('Hispanic'))),
    ('White %', agate.Formula(agate.Number(), make_percentage('White'))),
    ('Black %', agate.Formula(agate.Number(), make_percentage('Black'))),
    ('Indian %', agate.Formula(agate.Number(), make_percentage('Indian'))),
    ('Asian %', agate.Formula(agate.Number(), make_percentage('Asian'))),
    ('Hawaiian %', agate.Formula(agate.Number(), make_percentage('Hawaiian'))),
    ('Other %', agate.Formula(agate.Number(), make_percentage('Other'))),
    ('Two or more %', agate.Formula(agate.Number(), make_percentage('Two or more'))),
])

# We should be able to refactor this as a loop through an original list of columns.
# Part of what I don't know is how to create a new column_name from the column
# passed in, among other issues.

In [10]:
# setting up race columns for print function below
races = [
    'Hispanic %',
    'White %',
    'Black %',
    'Asian %',
    'Hawaiian %',
    'Other %',
    'Two or more %',
]

# function to print a sentence and table for each percent column
def print_winners(race):
    for race in races:
        print('County with highest {}\n'.format(race))
        percentages.select(['Geography', race]).order_by(race, reverse=True).print_table(3)
        print('\n')

print_winners(races)

County with highest Hispanic %

| Geography            | Hispanic % |
| -------------------- | ---------- |
| Webb County, Texas   |      95.74 |
| Maverick County, ... |      95.68 |
| Starr County, Texas  |      95.68 |
| ...                  |        ... |


County with highest White %

| Geography            | White % |
| -------------------- | ------- |
| Clay County, Texas   |   92.46 |
| Armstrong County,... |   90.74 |
| Roberts County, T... |   90.53 |
| ...                  |     ... |


County with highest Black %

| Geography            | Black % |
| -------------------- | ------- |
| Jefferson County,... |   33.50 |
| Houston County, T... |   25.83 |
| Falls County, Texas  |   24.98 |
| ...                  |     ... |


County with highest Asian %

| Geography            | Asian % |
| -------------------- | ------- |
| Fort Bend County,... |   16.87 |
| Collin County, Texas |   11.16 |
| Denton County, Texas |    6.50 |
| ...                  |     ... |


County with hig