In [None]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

# Activity 06: Census

The US completes a census every 10 years to take a count of all people living in the country. Then, each year the US Census Bureau creates an updated estimate of that population until the next census is completed. We are still waiting for the US Census Bureau to release data in a .csv format that's suitable for use in a Jupyter Notebook.

In the meantime, load the data from the US Census Bureau from the 2019 update ([source](https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2019/?C=S;O=A))

In [None]:
full = Table.read_table('data/nc-est2019-agesex-res.csv')
full

You'll notice that there are many variables for each row of this data set. We'll describe them below:

|VARIABLE|DESCRIPTION|
|--------|-----------|
|SEX|Sex|
|AGE|Age|
|CENSUS2010POP|4/1/2010 resident Census 2010 population|
|ESTIMATESBASE2010|4/1/2010 resident population estimates base|
|POPESTIMATE2010|7/1/2010 resident population estimate|
|POPESTIMATE2011|7/1/2011 resident population estimate|
|POPESTIMATE2012|7/1/2012 resident population estimate|
|POPESTIMATE2013|7/1/2013 resident population estimate|
|POPESTIMATE2014|7/1/2014 resident population estimate|
|POPESTIMATE2015|7/1/2015 resident population estimate|
|POPESTIMATE2016|7/1/2016 resident population estimate|
|POPESTIMATE2017|7/1/2017 resident population estimate|
|POPESTIMATE2018|7/1/2018 resident population estimate|
|POPESTIMATE2019|7/1/2019 resident population estimate|

## `SEX`
The key for SEX is as follows:

>0 = Total
>
>1 = Male
>
>2 = Female

## `AGE`
AGE is single-year of age (0, 1, 2, . . ., 99, 100+ years) and 999 is used to indicate total population

## Exploring the Data

Now, let's focus on the popultionas from the official census in 2010 and the most recent estimates available from 2019. Select only the columns `SEX`, `AGE`, `CENSUS2010POP`, and `POPESTIMATE2019`.

In [None]:
# Select the columns needed from `full`
partial = full.select('SEX', 'AGE', 'CENSUS2010POP', 'POPESTIMATE2019')
partial

In [None]:
# Rename the columns to make them easier to refer to
simple = partial.relabeled('CENSUS2010POP', '2010').relabeled('POPESTIMATE2019', '2019')
simple

In [None]:
# Look at some line graphs and discuss the patterns
to_graph = simple.where('SEX', 1).where('AGE', are.below(999)).drop('SEX')
to_graph.plot('AGE', '2010')

In [None]:
# Look at some line graphs and discuss the patterns
to_graph = simple.where('SEX', 2).where('AGE', are.below(999)).drop('SEX')
to_graph.plot('AGE', '2010')

## Percent Change

Create a new table named `changes` that has the following columns: `AGE`, `POPCHANGE`, and `PERCENTCHANGE`, where each row corresponds to a row in the `partial` table where `SEX` was 0, and `AGE` was for a single-year (not 999). 

The `POPCHANGE` column should represent the popualation growth from the 2010 census to the 2019 estimate, measured in people.

The `PERCENTCHANGE` column should represent the percent change between from the 2010 census to the 2019 estimate. Use array operations to calculate these columns and table methods to create the table that contains them.

In [None]:
# Create a table with the reduced set of rows and columns
initial = simple.where('SEX', 0).where('AGE', are.below(999)).drop('SEX')
initial

In [None]:
# Create an ARRAY that contains the values for the `POPCHANGE` column
population_change = ...
population_change

In [None]:
# Create an ARRAY that contains the values for the `PERCENTCHANGE` column
percent_change = ...
percent_change

In [None]:
# Create the final table named `changes` has has columns `AGE`, `2010`, `2019`, `POPCHANGE`, and `PERCENTCHANGE`
changes = ...
changes

## Explore Census Data

The US Census Bureau held a competition around the 2020 Census called "Let's Make it Count" [link](https://www.letsmakeitcount.org/). It's described as follows:

> ### ASK QUESTIONS. EXPLORE DATA. SHARE INSIGHTS.
> Learn more about Census data at https://census.gov/academy and submit a story here about how Census information and context matters to you and/or your community. 

> Submissions can include, but are not limited to: posters, infographics, essays, captioned photos, interactive or static data visualization(s), apps, and websites.

> Entries will be reviewed by the competition hosts and community leaders for creativity, clarity of message, and the use of Census data for effective storytelling. Submissions should address how the participant has been, is, or will be Asking Questions, Exploring Data, and Sharing Insights.

While we haven't learned a lot about data visualization yet in this course, you should be able to dig into this census data and look for patterns and trends. Create some code cells below and investigate the data. If you find anything of interest, share it to the course discussion board under Activity 06.