<a href="https://colab.research.google.com/github/christophermalone/DSCI325/blob/main/Module6_Part3_CensusAPI_Bash.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Census API - BASH

Bash is a command-line language that is widely available for a variety of operating systems, e.g. Linux, Mac, Windows.  Bash has some efficiences that other languages do not - especially regarding the managment of files.

One method of getting a file from a server with bash is **curl**.  Curl stands for Client URL and is a command-line tool for transferring files using various network protocals.  Curl allows one to get or push files using URL syntax.


Source: https://en.wikipedia.org/wiki/CURL

<table width='100%' ><tr><td bgcolor='green'></td></tr></table>

## Example 6.3.Bash
Consider an investigation on the proportion of people that bicycle to work. For this investigation, data will be obtained from the US Census Bureau.  The American Community Survey (ACS) - 5 Year - Subject data will be used to pull this data.
 

*   Data Source: American Community Survey - 5 Year data
*   Census Unit:  Census Tract
*   Variables: NAME of Census Tract; S0801_C01_001E, i.e. number of workers age 16 +; S0801_C01_009E, i.e. % of workers who use public tranportation to get to work, S0801_C01_011E, i.e. % of workers who bicyle to work
*   Other:  for = tract:*, i.e. all tracts, in = state:01, i.e. FIPS code for AL

Note:  There are a very large number of census tracts in the US; thus, specifying a state is required.


Source: https://www.census.gov/data/developers/data-sets/acs-5year.html

<table width='100%' ><tr><td bgcolor='green'></td></tr></table>

**Goal**: Obtain a Top 10 List for census tracts that have the highest proportion of people who bicycle to work.

## Preparing a Single File

Consider an inital call the Census Bureau API via a web browser.

Source:  https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:01

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1U0tGSDVDLxfAbb6ZgWfes9OHyJI6oI9b" width='75%' height='75%'></p>

The curl command can be used to retrieve the same information; however, instead of sending the information to the browser, information will be saved into a file.
<h1 align='center'><font color='green'>curl -o <i> < filename > </i> <i> < url > </i></font></h1>

Note:  The **-o** option will write the contents to a file.

In [None]:
#bash command to download the AL data from Census Bureau API
%%bash
curl -o /content/sample_data/AL.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:01"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  123k    0  123k    0     0   169k      0 --:--:-- --:--:-- --:--:--  169k


Taking a look at this file using the head bash command.

In [None]:
%%bash
  head /content/sample_data/AL.txt

[["NAME","S0801_C01_001E","S0801_C01_009E","S0801_C01_011E","state","county","tract"],
["Census Tract 36, Jefferson County, Alabama","1921","0.0","0.0","01","073","003600"],
["Census Tract 37, Jefferson County, Alabama","1399","6.7","0.0","01","073","003700"],
["Census Tract 38.02, Jefferson County, Alabama","2450","4.9","0.5","01","073","003802"],
["Census Tract 38.03, Jefferson County, Alabama","1737","3.1","0.0","01","073","003803"],
["Census Tract 39, Jefferson County, Alabama","502","3.2","0.0","01","073","003900"],
["Census Tract 40, Jefferson County, Alabama","833","10.7","0.0","01","073","004000"],
["Census Tract 42, Jefferson County, Alabama","804","4.4","0.0","01","073","004200"],
["Census Tract 45.01, Jefferson County, Alabama","905","0.0","0.0","01","073","004501"],
["Census Tract 45.02, Jefferson County, Alabama","1553","0.0","0.3","01","073","004502"],


Data from each state will eventually be gathered and the data from all states will be merged.  Thus, for now we will remove the header row so that this information is not mistakened as a record in the merged data.

In [None]:
#Using sed to remove the 1st line in the file
%%bash
 sed -i 1d /content/sample_data/AL.txt

Checking to make sure the first line has indeed been removed.

In [None]:
%%bash
  head /content/sample_data/AL.txt

["Census Tract 36, Jefferson County, Alabama","1921","0.0","0.0","01","073","003600"],
["Census Tract 37, Jefferson County, Alabama","1399","6.7","0.0","01","073","003700"],
["Census Tract 38.02, Jefferson County, Alabama","2450","4.9","0.5","01","073","003802"],
["Census Tract 38.03, Jefferson County, Alabama","1737","3.1","0.0","01","073","003803"],
["Census Tract 39, Jefferson County, Alabama","502","3.2","0.0","01","073","003900"],
["Census Tract 40, Jefferson County, Alabama","833","10.7","0.0","01","073","004000"],
["Census Tract 42, Jefferson County, Alabama","804","4.4","0.0","01","073","004200"],
["Census Tract 45.01, Jefferson County, Alabama","905","0.0","0.0","01","073","004501"],
["Census Tract 45.02, Jefferson County, Alabama","1553","0.0","0.3","01","073","004502"],
["Census Tract 47.01, Jefferson County, Alabama","2521","0.5","4.2","01","073","004701"],


The following sed commands will remove all [ and ] characters from the data file.

In [None]:
#Using sed to remove [ and ] characters
%%bash
 sed -i 's/\[//g' /content/sample_data/AL.txt
 sed -i 's/\]//g' /content/sample_data/AL.txt

Checking to make sure all [ and ] characters have been removed.

In [None]:
%%bash
  head /content/sample_data/AL.txt

"Census Tract 36, Jefferson County, Alabama","1921","0.0","0.0","01","073","003600",
"Census Tract 37, Jefferson County, Alabama","1399","6.7","0.0","01","073","003700",
"Census Tract 38.02, Jefferson County, Alabama","2450","4.9","0.5","01","073","003802",
"Census Tract 38.03, Jefferson County, Alabama","1737","3.1","0.0","01","073","003803",
"Census Tract 39, Jefferson County, Alabama","502","3.2","0.0","01","073","003900",
"Census Tract 40, Jefferson County, Alabama","833","10.7","0.0","01","073","004000",
"Census Tract 42, Jefferson County, Alabama","804","4.4","0.0","01","073","004200",
"Census Tract 45.01, Jefferson County, Alabama","905","0.0","0.0","01","073","004501",
"Census Tract 45.02, Jefferson County, Alabama","1553","0.0","0.3","01","073","004502",
"Census Tract 47.01, Jefferson County, Alabama","2521","0.5","4.2","01","073","004701",


The following sed command will remove all the , at the end of each line.

In [None]:
#Remove comma at the end of each line
%%bash
  sed -i 's/,$//' /content/sample_data/AL.txt

Checking to make sure the comma at the end of each line is removed.

In [None]:
%%bash
  head /content/sample_data/AL.txt

"Census Tract 36, Jefferson County, Alabama","1921","0.0","0.0","01","073","003600"
"Census Tract 37, Jefferson County, Alabama","1399","6.7","0.0","01","073","003700"
"Census Tract 38.02, Jefferson County, Alabama","2450","4.9","0.5","01","073","003802"
"Census Tract 38.03, Jefferson County, Alabama","1737","3.1","0.0","01","073","003803"
"Census Tract 39, Jefferson County, Alabama","502","3.2","0.0","01","073","003900"
"Census Tract 40, Jefferson County, Alabama","833","10.7","0.0","01","073","004000"
"Census Tract 42, Jefferson County, Alabama","804","4.4","0.0","01","073","004200"
"Census Tract 45.01, Jefferson County, Alabama","905","0.0","0.0","01","073","004501"
"Census Tract 45.02, Jefferson County, Alabama","1553","0.0","0.3","01","073","004502"
"Census Tract 47.01, Jefferson County, Alabama","2521","0.5","4.2","01","073","004701"


The last issue that needs to be dealt with is to include a line break for the last record in each file.  This will be needed when the files are merged together.


<p align='center'><img src="https://drive.google.com/uc?export=view&id=1F-8xQbiVtDMaFEDlBUTCiy6YM-Q-ayL9" width='75%' height='75%'></p>

Taking a look the hidden characters, notice that the last line is missing it's end-of-line character.

In [None]:
#Using cat -A to see hidden characters in the file
%%bash
 tail /content/sample_data/AL.txt | cat -A

"Census Tract 401.08, St. Clair County, Alabama","2184","0.0","0.0","01","115","040108"$
"Census Tract 401.09, St. Clair County, Alabama","1000","0.0","0.0","01","115","040109"$
"Census Tract 401.10, St. Clair County, Alabama","1976","0.0","0.0","01","115","040110"$
"Census Tract 402.06, St. Clair County, Alabama","1274","0.0","0.0","01","115","040206"$
"Census Tract 402.07, St. Clair County, Alabama","1290","0.0","0.0","01","115","040207"$
"Census Tract 402.08, St. Clair County, Alabama","1487","0.3","0.0","01","115","040208"$
"Census Tract 402.09, St. Clair County, Alabama","1661","0.0","0.0","01","115","040209"$
"Census Tract 402.10, St. Clair County, Alabama","1444","0.0","0.0","01","115","040210"$
"Census Tract 402.11, St. Clair County, Alabama","1702","0.0","0.0","01","115","040211"$
"Census Tract 402.12, St. Clair County, Alabama","1079","0.0","0.0","01","115","040212"

The following will add an end-of-line character to the last line of of the AL.txt file.

In [None]:
#Add an end-of-line character to the last line in the file
%%bash
  sed -i -e '$a\' /content/sample_data/AL.txt

Check to make sure an end-of-line hidden character has been added to the last line of the AL.txt file.

In [None]:
%%bash
 tail /content/sample_data/AL.txt | cat -A

"Census Tract 401.08, St. Clair County, Alabama","2184","0.0","0.0","01","115","040108"$
"Census Tract 401.09, St. Clair County, Alabama","1000","0.0","0.0","01","115","040109"$
"Census Tract 401.10, St. Clair County, Alabama","1976","0.0","0.0","01","115","040110"$
"Census Tract 402.06, St. Clair County, Alabama","1274","0.0","0.0","01","115","040206"$
"Census Tract 402.07, St. Clair County, Alabama","1290","0.0","0.0","01","115","040207"$
"Census Tract 402.08, St. Clair County, Alabama","1487","0.3","0.0","01","115","040208"$
"Census Tract 402.09, St. Clair County, Alabama","1661","0.0","0.0","01","115","040209"$
"Census Tract 402.10, St. Clair County, Alabama","1444","0.0","0.0","01","115","040210"$
"Census Tract 402.11, St. Clair County, Alabama","1702","0.0","0.0","01","115","040211"$
"Census Tract 402.12, St. Clair County, Alabama","1079","0.0","0.0","01","115","040212"$


Finally, before moving onto working with two files, let's remove all files in the /content/sample_data/ folder.

In [None]:
#Remove all files from the /content/sample_data/ folder
%%bash
rm /content/sample_data/*.*

## Working with Two Files

The following curl commands will call the Census Bureau API and obtain data for AL and AK.  Once again, the files (or data) being retrieved will be saved into two seperate text files.

In [None]:
#Using curl to retrieve data from Census Bureau API
%%bash
curl -o /content/sample_data/AL.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:01"
curl -o /content/sample_data/AK.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:02"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  123k    0  123k    0     0   152k      0 --:--:-- --:--:-- --:--:--  152k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100 16748    0 16748    0     0  29485      0 --:--:-- --:--:-- --:--:-- 29485


Again, removing the header information from both files.  A wildcard, i.e. \*, is being used here when referring to the file to remove the header from.  This will remove the header information from **all** text files in this directory.

In [None]:
#Delete 1st line from all *.txt files
%%bash
 sed -i 1d /content/sample_data/*.txt

The following sed commands will remove all [ and ] characters from the data file.

In [None]:
%%bash
 sed -i 's/\[//g' /content/sample_data/*.txt
 sed -i 's/\]//g' /content/sample_data/*.txt

The following sed command will remove all the , at the end of each line.

In [None]:
%%bash
  sed -i 's/,$//' /content/sample_data/*.txt

Lastly, ensure that a line break exists for the last line in each file.

In [None]:
%%bash
  sed -i -e '$a\' /content/sample_data/*.txt

There are two files here - one file that contains information for AL and another that contains information for AK. The following command will **merge** (or concatenate) the contents of these two files.  The merged data is saved into a file called Both_States.txt.

In [None]:
%%bash
  cat /content/sample_data/*.txt > /content/sample_data/Both_States.txt

Before proceeding, let us make sure the merge was successful.

In [None]:
%%bash
  head /content/sample_data/Both_States.txt

"Census Tract 1, Aleutians East Borough, Alaska","2246","0.1","0.7","02","013","000100"
"Census Tract 1, Yukon-Koyukuk Census Area, Alaska","368","2.7","0.0","02","290","000100"
"Census Tract 2, Yukon-Koyukuk Census Area, Alaska","647","0.0","0.2","02","290","000200"
"Census Tract 3, Yukon-Koyukuk Census Area, Alaska","733","0.0","0.4","02","290","000300"
"Census Tract 4, Yukon-Koyukuk Census Area, Alaska","382","0.0","0.0","02","290","000400"
"Census Tract 1, Yakutat City and Borough, Alaska","294","0.0","6.5","02","282","000100"
"Census Tract 3, Wrangell City and Borough, Alaska","1011","0.3","1.7","02","275","000300"
"Census Tract 1, Southeast Fairbanks Census Area, Alaska","910","1.2","0.0","02","240","000100"
"Census Tract 4, Southeast Fairbanks Census Area, Alaska","1755","0.3","0.0","02","240","000400"
"Census Tract 1, Skagway Municipality, Alaska","804","0.5","5.8","02","230","000100"


Next, let's ensure that these two files were indeed successfully merged together.  First, obtain the number of lines in the first file using the **wc -l** command.

In [None]:
%%bash
  wc -l /content/sample_data/AK.txt

177 /content/sample_data/AK.txt


For these two files the transition between the two states happens at Line 177. The following sed option will print lines 170 through 185 of the Both_States.txt file.  After reviewing this output, it appear the merging of the two files was successful.

In [None]:
%%bash
  sed '170,185!d;=' /content/sample_data/Both_States.txt | paste -d: - -

170:"Census Tract 28.22, Anchorage Municipality, Alaska","2461","0.0","0.5","02","020","002822"
171:"Census Tract 28.23, Anchorage Municipality, Alaska","2281","0.0","0.0","02","020","002823"
172:"Census Tract 29, Anchorage Municipality, Alaska","1379","0.1","2.0","02","020","002900"
173:"Census Tract 9800, Anchorage Municipality, Alaska","0","-666666666.0","-666666666.0","02","020","980000"
174:"Census Tract 9801, Anchorage Municipality, Alaska","3040","0.2","0.3","02","020","980100"
175:"Census Tract 9802, Anchorage Municipality, Alaska","4113","0.4","0.9","02","020","980200"
176:"Census Tract 1, Aleutians West Census Area, Alaska","563","2.5","3.9","02","016","000100"
177:"Census Tract 2, Aleutians West Census Area, Alaska","3110","0.3","0.1","02","016","000200"
178:"Census Tract 36, Jefferson County, Alabama","1921","0.0","0.0","01","073","003600"
179:"Census Tract 37, Jefferson County, Alabama","1399","6.7","0.0","01","073","003700"
180:"Census Tract 38.02, Jefferson County, Alaba

After the merge has been completed, the following command can be used to insert a header row back into the file containing the data.

In [None]:
%%bash
  sed  -i '1i CensusTract_Name, Number_Workers, Percent_PublicTransportation,Percent_Bicycle,StateFIPS,CountyFIPS,CensusTractFIPS' /content/sample_data/Both_States.txt

Making sure the header row has been successfully added to the data file.

In [None]:
%%bash
  head /content/sample_data/Both_States.txt

CensusTract_Name, Number_Workers, Percent_PublicTransportation,Percent_Bicycle,StateFIPS,CountyFIPS,CensusTractFIPS
"Census Tract 1, Aleutians East Borough, Alaska","2246","0.1","0.7","02","013","000100"
"Census Tract 1, Yukon-Koyukuk Census Area, Alaska","368","2.7","0.0","02","290","000100"
"Census Tract 2, Yukon-Koyukuk Census Area, Alaska","647","0.0","0.2","02","290","000200"
"Census Tract 3, Yukon-Koyukuk Census Area, Alaska","733","0.0","0.4","02","290","000300"
"Census Tract 4, Yukon-Koyukuk Census Area, Alaska","382","0.0","0.0","02","290","000400"
"Census Tract 1, Yakutat City and Borough, Alaska","294","0.0","6.5","02","282","000100"
"Census Tract 3, Wrangell City and Borough, Alaska","1011","0.3","1.7","02","275","000300"
"Census Tract 1, Southeast Fairbanks Census Area, Alaska","910","1.2","0.0","02","240","000100"
"Census Tract 4, Southeast Fairbanks Census Area, Alaska","1755","0.3","0.0","02","240","000400"


Once again, before moving onto the gathering of data from all states, let's remove the contents of the /content/sample_data/ folder.

In [None]:
%%bash
rm /content/sample_data/*.*

## Gathering Data from All Census Tracts

The required data processing steps have been defined by the steps outlined above.  These same steps will not be used on data collected on **all** census tracts across the entire United States.

A spreadsheet that contains a list of all state abbreviations and FIPS codes is provided here.  The concatenate() spreadsheet function was actually used to write each curl command.

Source: https://docs.google.com/spreadsheets/d/1NijveG6Z3H3_IOZDFF_FwLhwn5hDD2s319OMlVVAp9k/edit?usp=sharing

In [None]:
%%bash
curl -o /content/sample_data/AL.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:01"
curl -o /content/sample_data/AK.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:02"
curl -o /content/sample_data/AZ.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:04"
curl -o /content/sample_data/AR.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:05"
curl -o /content/sample_data/CA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:06"
curl -o /content/sample_data/CO.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:08"
curl -o /content/sample_data/CT.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:09"
curl -o /content/sample_data/DE.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:10"
curl -o /content/sample_data/FL.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:12"
curl -o /content/sample_data/GA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:13"
curl -o /content/sample_data/HI.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:15"
curl -o /content/sample_data/ID.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:16"
curl -o /content/sample_data/IL.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:17"
curl -o /content/sample_data/IN.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:18"
curl -o /content/sample_data/IA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:19"
curl -o /content/sample_data/KS.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:20"
curl -o /content/sample_data/KY.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:21"
curl -o /content/sample_data/LA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:22"
curl -o /content/sample_data/ME.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:23"
curl -o /content/sample_data/MD.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:24"
curl -o /content/sample_data/MA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:25"
curl -o /content/sample_data/MI.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:26"
curl -o /content/sample_data/MN.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:27"
curl -o /content/sample_data/MS.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:28"
curl -o /content/sample_data/MO.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:29"
curl -o /content/sample_data/MT.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:30"
curl -o /content/sample_data/NE.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:31"
curl -o /content/sample_data/NV.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:32"
curl -o /content/sample_data/NH.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:33"
curl -o /content/sample_data/NJ.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:34"
curl -o /content/sample_data/NM.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:35"
curl -o /content/sample_data/NY.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:36"
curl -o /content/sample_data/NC.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:37"
curl -o /content/sample_data/ND.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:38"
curl -o /content/sample_data/OH.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:39"
curl -o /content/sample_data/OK.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:40"
curl -o /content/sample_data/OR.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:41"
curl -o /content/sample_data/PA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:42"
curl -o /content/sample_data/RI.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:44"
curl -o /content/sample_data/SC.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:45"
curl -o /content/sample_data/SD.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:46"
curl -o /content/sample_data/TN.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:47"
curl -o /content/sample_data/TX.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:48"
curl -o /content/sample_data/UT.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:49"
curl -o /content/sample_data/VT.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:50"
curl -o /content/sample_data/VA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:51"
curl -o /content/sample_data/WA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:53"
curl -o /content/sample_data/WV.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:54"
curl -o /content/sample_data/WI.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:55"
curl -o /content/sample_data/WY.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:56"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  123k    0  123k    0     0   163k      0 --:--:-- --:--:-- --:--:--  163k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100 16748    0 16748    0     0  29642      0 --:--:-- --:--:-- --:--:-- 29590
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:

**Step 1**:  Remove the header row from each file.

In [None]:
%%bash
 sed -i 1d /content/sample_data/*.txt

**Step 2**: Remove the [ and ] characters from each file.

In [None]:
%%bash
 sed -i 's/\[//g' /content/sample_data/*.txt
 sed -i 's/\]//g' /content/sample_data/*.txt

**Step 3**: Remove the comma at the end of each line.

In [None]:
%%bash
  sed -i 's/,$//' /content/sample_data/*.txt

**Step 4**: Add an end-of-line character to the last line of each file.

In [None]:
%%bash
  sed -i -e '$a\' /content/sample_data/*.txt

**Step 5**: Merge all files together into a single file

In [None]:
%%bash
  cat /content/sample_data/*.txt > /content/sample_data/All_States.txt

**Step 6**: Add a header row to the merged data

In [None]:
%%bash
  sed  -i '1i CensusTract_Name,Number_Workers,Percent_PublicTransportation,Percent_Bicycle,StateFIPS,CountyFIPS,CensusTractFIPS' /content/sample_data/All_States.txt

Check the file size of the combined data to ensure the file size is reasonable.

In [None]:
%%bash
ls -l /content/sample_data/A*.txt

-rw-r--r-- 1 root root   16131 Apr 28 15:46 /content/sample_data/AK.txt
-rw-r--r-- 1 root root 7310816 Apr 28 15:46 /content/sample_data/All_States.txt
-rw-r--r-- 1 root root  121793 Apr 28 15:46 /content/sample_data/AL.txt
-rw-r--r-- 1 root root   70811 Apr 28 15:46 /content/sample_data/AR.txt
-rw-r--r-- 1 root root  150708 Apr 28 15:46 /content/sample_data/AZ.txt


## Analysis Step - Python

The analysis step is fairly straight forward - a Top 10 list is needed.

In [None]:
import pandas as pd

First, read the data into Python.  The StateFIPS, CountyFips, and CensusTractFIPS will be read in as strings.  These specifications are made using the **dtype** argument in pd.read_csv() function.

In [None]:
#Read in the data using read_csv() function from the pandas package
Commuter_All_CensusTracts = pd.read_csv("/content/sample_data/All_States.txt", dtype={'StateFIPS':str, 'CountyFIPS':str, 'CensusTractFIPS':str}) 

Taking a look at the pandas data.frame.

In [None]:
Commuter_All_CensusTracts.head(5)

Unnamed: 0,CensusTract_Name,Number_Workers,Percent_PublicTransportation,Percent_Bicycle,StateFIPS,CountyFIPS,CensusTractFIPS
0,"Census Tract 1, Aleutians East Borough, Alaska",2246,0.1,0.7,2,13,100
1,"Census Tract 1, Yukon-Koyukuk Census Area, Alaska",368,2.7,0.0,2,290,100
2,"Census Tract 2, Yukon-Koyukuk Census Area, Alaska",647,0.0,0.2,2,290,200
3,"Census Tract 3, Yukon-Koyukuk Census Area, Alaska",733,0.0,0.4,2,290,300
4,"Census Tract 4, Yukon-Koyukuk Census Area, Alaska",382,0.0,0.0,2,290,400


Using shape to get the dimensions of the data.frame.

In [None]:
Commuter_All_CensusTracts.shape

(84208, 7)

Next, data manipulation will be done using the **dfply** package in Python.

In [None]:
pip install dfply

Collecting dfply
  Downloading dfply-0.3.3-py3-none-any.whl (612 kB)
[?25l[K     |▌                               | 10 kB 24.6 MB/s eta 0:00:01[K     |█                               | 20 kB 28.2 MB/s eta 0:00:01[K     |█▋                              | 30 kB 31.5 MB/s eta 0:00:01[K     |██▏                             | 40 kB 35.1 MB/s eta 0:00:01[K     |██▊                             | 51 kB 36.9 MB/s eta 0:00:01[K     |███▏                            | 61 kB 40.6 MB/s eta 0:00:01[K     |███▊                            | 71 kB 28.5 MB/s eta 0:00:01[K     |████▎                           | 81 kB 28.7 MB/s eta 0:00:01[K     |████▉                           | 92 kB 30.9 MB/s eta 0:00:01[K     |█████▍                          | 102 kB 30.1 MB/s eta 0:00:01[K     |█████▉                          | 112 kB 30.1 MB/s eta 0:00:01[K     |██████▍                         | 122 kB 30.1 MB/s eta 0:00:01[K     |███████                         | 133 kB 30.1 MB/s eta 0:00:

In [None]:
from dfply import *

The following dfply commands will do all the necessary processing steps to obtain a Top 10 List for the proportion of people who bicycle to work.

In [133]:
Bicycle_Top10 = (
                 Commuter_All_CensusTracts 
                 #>> filter_by(X.StateFIPS == '27')                                 # Filter for state if so desired
                 >> filter_by(X.Number_Workers > 50)                               # Filter on making sure Number_Workers is not too small
                 >> arrange(X.Percent_Bicycle, ascending = False)                  # Sort the list by Percent_Bicycle
                 >> head(10)                                                       # Keep only the Top 10 records in this data.frame
                 >> mutate(CensusTract_Link =                                      # Paste together a link for the census track
                                 'https://censusreporter.org/profiles/14000US'
                                 + X.StateFIPS
                                 + X.CountyFIPS
                                 + X.CensusTractFIPS                           
                            )
                 
                 >> select(X.CensusTract_Name, X.Number_Workers, X.Percent_Bicycle, X.CensusTract_Link) #Select only the needed fields
               )

Bicycle_Top10

Unnamed: 0,CensusTract_Name,Number_Workers,Percent_Bicycle,CensusTract_Link
3931,"Census Tract 205.03, La Paz County, Arizona",134,65.7,https://censusreporter.org/profiles/14000US040...
6449,"Census Tract 105.01, Yolo County, California",2029,41.3,https://censusreporter.org/profiles/14000US061...
4700,"Census Tract 5130, Santa Clara County, California",4143,41.2,https://censusreporter.org/profiles/14000US060...
8113,"Census Tract 9803, Santa Barbara County, Calif...",2340,36.1,https://censusreporter.org/profiles/14000US060...
12702,"Census Tract 5117.05, Santa Clara County, Cali...",757,36.1,https://censusreporter.org/profiles/14000US060...
6465,"Census Tract 107.03, Yolo County, California",2142,32.5,https://censusreporter.org/profiles/14000US061...
8098,"Census Tract 29.26, Santa Barbara County, Cali...",2585,31.5,https://censusreporter.org/profiles/14000US060...
76522,"Census Tract 11.01, Travis County, Texas",1166,28.3,https://censusreporter.org/profiles/14000US484...
76065,"Census Tract 9501, Zavala County, Texas",552,28.1,https://censusreporter.org/profiles/14000US485...
8103,"Census Tract 29.36, Santa Barbara County, Cali...",1724,27.6,https://censusreporter.org/profiles/14000US060...


Finally, the **HTML** package will be used to create clickable links within the pandas data.frame.

In [134]:
from IPython.display import HTML

The final table that contains the Top 10 List with clickable links.

In [135]:
HTML(Bicycle_Top10.to_html(render_links=True, escape=False))

Unnamed: 0,CensusTract_Name,Number_Workers,Percent_Bicycle,CensusTract_Link
3931,"Census Tract 205.03, La Paz County, Arizona",134,65.7,https://censusreporter.org/profiles/14000US04012020503
6449,"Census Tract 105.01, Yolo County, California",2029,41.3,https://censusreporter.org/profiles/14000US06113010501
4700,"Census Tract 5130, Santa Clara County, California",4143,41.2,https://censusreporter.org/profiles/14000US06085513000
8113,"Census Tract 9803, Santa Barbara County, California",2340,36.1,https://censusreporter.org/profiles/14000US06083980300
12702,"Census Tract 5117.05, Santa Clara County, California",757,36.1,https://censusreporter.org/profiles/14000US06085511705
6465,"Census Tract 107.03, Yolo County, California",2142,32.5,https://censusreporter.org/profiles/14000US06113010703
8098,"Census Tract 29.26, Santa Barbara County, California",2585,31.5,https://censusreporter.org/profiles/14000US06083002926
76522,"Census Tract 11.01, Travis County, Texas",1166,28.3,https://censusreporter.org/profiles/14000US48453001101
76065,"Census Tract 9501, Zavala County, Texas",552,28.1,https://censusreporter.org/profiles/14000US48507950100
8103,"Census Tract 29.36, Santa Barbara County, California",1724,27.6,https://censusreporter.org/profiles/14000US06083002936




---



---
End of Document

