# Accessing census data using cenpy

## Lecture objectives

1. Show how to access census data using `cenpy`
2. Introduce basic plotting of geographic data with `geopandas`

Rather than using the Census API, we can call it through the `cenpy` package. This is often an easier way to get the census data, at least for simple datasets.

The major downside of `cenpy` (at least, for now) is that it doesn't support any data vintage after 2019. For the 2020 census and subsequent American Community Survey releases, you'll have to use the Census Bureau API.

<div class="alert alert-block alert-info">
<h3>Update</h3>
<p>cenpy does not appear to be supported any longer, and does not work with newer versions of geopandas. So the code in this lecture video will probably no longer work. Follow along as a point of information, but don't try and run the code yourself. Hopefully cenpy will be updated in the future.</p>
</div>

In [1]:
import cenpy
from cenpy import products

# create a connection to the American Community Survey
acs = products.ACS()

ChunkedEncodingError: ('Connection broken: IncompleteRead(4089 bytes read, 6151 more expected)', IncompleteRead(4089 bytes read, 6151 more expected))

The [online documentation](https://cenpy-devs.github.io/cenpy/api.html#product-american-community-survey) is helpful in showing the functions that are available. We could also call `help(acs)` or just `acs?`.

The `tables` attribute  seems useful, as do the `filter_tables` and `from_county` functions.

In [None]:
# what tables are available?
acs.tables

In [None]:
# Let's map the age of the housing stock
# get all the tables that have "BUILT" in their description
acs.filter_tables('BUILT', by='description')

In [None]:
# it looks like table B25035 and variable B25035_001E are promising, Let's see what is here in Riverside County
riverside = products.ACS(2017).from_county('Riverside, CA', level='tract',
                                        variables='B25035_001E')

# you might get a bunch of FutureWarnings, but you can ignore these

In [None]:
# It looks like cenpy gives us a geopandas dataframe
type(riverside)

In [None]:
riverside.head()

In [None]:
# let's rename the census column to something more memorable
riverside.rename(columns={'B25035_001E':'Median year built'}, inplace=True)

In [None]:
riverside.head()

`GEOID` gives the standard census FIPS code, formatted as 2-digit state + 3-digit county + 6 digit tract. Read more about them [here](https://www.policymap.com/2012/08/tips-on-fips-a-quick-guide-to-geographic-place-codes-part-iii/).

`cenpy` also returns the geographic boundaries of each census tract as a polygon. This is helpful! And it means that we can plot the data pretty simply.

Here, we use the standard `geopandas` plotting function. We tell it to plot the `Median year built` column, on the `ax` object that we just created.

In [None]:
import matplotlib.pyplot as plt 

# create a matplotlib figure and axis object
fig, ax = plt.subplots(figsize=(20,10))

riverside.plot('Median year built', ax=ax, cmap='plasma', legend=True, 
               legend_kwds={'orientation': 'horizontal'})
ax.set_facecolor('k')

There is much that we could do to improve this map, but let's save that for another time. In general, the best course is to follow the numerous examples for `geopandas` that you'll find online.

<div class="alert alert-block alert-info">
<h3>Key Takeaways</h3>
<ul>
  <li>For simple queries, cenpy is a good alternative to the Census Bureau API.</li>
  <li>cenpy also provides a handy way to get the geographic boundaries, for easy plotting.</li>
  <li>However, cenpy only has a limited range of geographies and datasets, so for some tasks you may need to use the API.</li>
</ul>
</div>