# Getting census data

## Lecture objectives

1. Provide more experience with APIs and `requests`
2. Demonstrate how to access census data

Traditionally, if you wanted census data, you had to download .csv or other files and decipher them. Recently, the census has introduced an API. [See the documentation here](https://www.census.gov/data/developers/guidance/api-user-guide.Example_API_Queries.html).

If you request more than 500 queries a day, you'll need to register for a (free) [API key](https://www.census.gov/data/developers/guidance/api-user-guide.Help_&_Contact_Us.html) from the Census Bureau.

Let's download population by county from the 2021 American Community Survey five-year estimates. We see from the documentation that the API call takes the following form:

`https://api.census.gov/data/YEAR/acs/DATASET?get=TABLENAME&for=GEOGRAPHY`

So getting the population (variable `B01001_001E`) for all counties is:

`https://api.census.gov/data/2021/acs/acs5?get=B01001_001E&for=county`

Try this in your browser.

Now let's get it into Python.

In [1]:
import requests
r = requests.get('https://api.census.gov/data/2021/acs/acs5?get=B01001_001E&for=county')
print(type(r.text))
# This time, it looks like the data come in a string
print(r.text)

<class 'str'>
[["B01001_001E","state","county"],
["58239","01","001"],
["227131","01","003"],
["25259","01","005"],
["22412","01","007"],
["58884","01","009"],
["10386","01","011"],
["19181","01","013"],
["116425","01","015"],
["34834","01","017"],
["24975","01","019"],
["44857","01","021"],
["12792","01","023"],
["23346","01","025"],
["14184","01","027"],
["15046","01","029"],
["53043","01","031"],
["56789","01","033"],
["11778","01","035"],
["10442","01","037"],
["37490","01","039"],
["13300","01","041"],
["87129","01","043"],
["49443","01","045"],
["39162","01","047"],
["71554","01","049"],
["87146","01","051"],
["36879","01","053"],
["103468","01","055"],
["16365","01","057"],
["32034","01","059"],
["26604","01","061"],
["7851","01","063"],
["14819","01","065"],
["17165","01","067"],
["106355","01","069"],
["52548","01","071"],
["672550","01","073"],
["13929","01","075"],
["93342","01","077"],
["33089","01","079"],
["172223","01","081"],
["101217","01","083"],
["10334","01","085"],

In [2]:
# But turns out that it's actually a JSON
censusdata = r.json()
type(censusdata)

list

In [3]:
# The JSON format is a list of lists. The first sublist is the column headers
censusdata[:5] # show the first five rows

[['B01001_001E', 'state', 'county'],
 ['58239', '01', '001'],
 ['227131', '01', '003'],
 ['25259', '01', '005'],
 ['22412', '01', '007']]

In [4]:
# So we can also convert this to a pandas dataframe, if we use the first list as the column names
# Note that the state and county are shown by their FIPS codes
import pandas as pd
df = pd.DataFrame(censusdata[1:], columns=censusdata[0])
df

Unnamed: 0,B01001_001E,state,county
0,58239,01,001
1,227131,01,003
2,25259,01,005
3,22412,01,007
4,58884,01,009
...,...,...,...
3216,54544,72,145
3217,8317,72,147
3218,22341,72,149
3219,31047,72,151


Let's rename the column to something more meaningful. `pandas` has a helpful `rename` function.

In [5]:
df.rename?

[31mSignature:[39m
df.rename(
    mapper: [33m'Renamer | None'[39m = [38;5;28;01mNone[39;00m,
    *,
    index: [33m'Renamer | None'[39m = [38;5;28;01mNone[39;00m,
    columns: [33m'Renamer | None'[39m = [38;5;28;01mNone[39;00m,
    axis: [33m'Axis | None'[39m = [38;5;28;01mNone[39;00m,
    copy: [33m'bool | None'[39m = [38;5;28;01mNone[39;00m,
    inplace: [33m'bool'[39m = [38;5;28;01mFalse[39;00m,
    level: [33m'Level | None'[39m = [38;5;28;01mNone[39;00m,
    errors: [33m'IgnoreRaise'[39m = [33m'ignore'[39m,
) -> [33m'DataFrame | None'[39m
[31mDocstring:[39m
Rename columns or index labels.

Function / dict values must be unique (1-to-1). Labels not contained in
a dict / Series will be left as-is. Extra labels listed don't throw an
error.

See the :ref:`user guide <basics.rename>` for more.

Parameters
----------
mapper : dict-like or function
    Dict-like or function transformations to apply to
    that axis' values. Use either ``mapper`` and 

In [6]:
# note the inplace keyword changes the dataframe in place, rather than returning a copy
df.rename(columns = {'B01001_001E':'population'}, inplace=True)
df

Unnamed: 0,population,state,county
0,58239,01,001
1,227131,01,003
2,25259,01,005
3,22412,01,007
4,58884,01,009
...,...,...,...
3216,54544,72,145
3217,8317,72,147
3218,22341,72,149
3219,31047,72,151


<div class="alert alert-block alert-info">
<strong>Exercise:</strong> Explore the Census Bureau API documentation. How would you get, for example, census tract level estimates for population by gender or race?
</div>

<div class="alert alert-block alert-info">
<h3>Key Takeaways</h3>
<ul>
  <li>Getting census data is one of the most common tasks we'll do in this course.</li>
  <li>The Census Bureau has a well-documented API, that may be useful for more specialized queries.</li>
</ul>
</div>