<a href="https://colab.research.google.com/github/christophermalone/DSCI325/blob/main/Module6_Part3_CensusAPI_Bash.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Census API - BASH

Bash is a command-line language that is widely available for a variety of operating systems, e.g. Linux, Mac, Windows.  Bash has some efficiences that other languages do not - especially regarding the managment of files.

One method of getting a file from a server with bash is **curl**.  Curl stands for Client URL and is a command-line tool for transferring files using various network protocals.  Curl allows one to get or push files using URL syntax.


Source: https://en.wikipedia.org/wiki/CURL

<table width='100%' ><tr><td bgcolor='green'></td></tr></table>

## Example 6.3.Bash
Consider an investigation on the proportion of people that bicycle to work. For this investigation, data will be obtained from the US Census Bureau.  The American Community Survey (ACS) - 5 Year - Subject data will be used to pull this data.
 

*   Data Source: American Community Survey - 5 Year data
*   Census Unit:  Census Tract
*   Variables: NAME of Census Tract; S0801_C01_001E, i.e. number of workers age 16 +; S0801_C01_009E, i.e. % of workers who use public tranportation to get to work, S0801_C01_011E, i.e. % of workers who bicyle to work
*   Other:  for = tract:*, i.e. all tracts, in = state:01, i.e. FIPS code for AL

Note:  There are a very large number of census tracts in the US; thus, specifying a state is required.

Source: https://www.census.gov/data/developers/data-sets/acs-5year.html

<table width='100%' ><tr><td bgcolor='green'></td></tr></table>

## Preparing a Single File

Consider an inital call the Census Bureau API via a web browser.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1U0tGSDVDLxfAbb6ZgWfes9OHyJI6oI9b" width='75%' height='75%'></p>

Source:  https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:01

The curl command can be used to retrieve the same information; however, instead of sending the information to the browser, information will be saved into a file.
<h1 align='center'><font color='green'>curl -o <i> < filename > </i> <i> < url > </i></font></h1>

Note:  The **-o** option will write the contents to a file.

In [61]:
#bash command to download the AL data from Census Bureau API
%%bash
curl -o /content/sample_data/AL.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:01"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100 53599    0 53599    0     0  64190      0 --:--:-- --:--:-- --:--:-- 64113100  123k    0  123k    0     0   136k      0 --:--:-- --:--:-- --:--:--  136k


Taking a look at this file using the head bash command.

In [53]:
%%bash
  head /content/sample_data/AL.txt

[["NAME","S0801_C01_001E","S0801_C01_009E","S0801_C01_011E","state","county","tract"],
["Census Tract 36, Jefferson County, Alabama","1921","0.0","0.0","01","073","003600"],
["Census Tract 37, Jefferson County, Alabama","1399","6.7","0.0","01","073","003700"],
["Census Tract 38.02, Jefferson County, Alabama","2450","4.9","0.5","01","073","003802"],
["Census Tract 38.03, Jefferson County, Alabama","1737","3.1","0.0","01","073","003803"],
["Census Tract 39, Jefferson County, Alabama","502","3.2","0.0","01","073","003900"],
["Census Tract 40, Jefferson County, Alabama","833","10.7","0.0","01","073","004000"],
["Census Tract 42, Jefferson County, Alabama","804","4.4","0.0","01","073","004200"],
["Census Tract 45.01, Jefferson County, Alabama","905","0.0","0.0","01","073","004501"],
["Census Tract 45.02, Jefferson County, Alabama","1553","0.0","0.3","01","073","004502"],


Data from each state will eventually be gathered and the data from all states will be merged.  Thus, for now we will remove the header row so that this information is not mistakened as a record in the merged data.

In [54]:
#Using sed to remove the 1st line in the file
%%bash
 sed -i 1d /content/sample_data/AL.txt

The following sed commands will remove all [ and ] characters from the data file.

In [55]:
#Using sed to remove [ and ] characters
%%bash
 sed -i 's/\[//g' /content/sample_data/AL.txt
 sed -i 's/\]//g' /content/sample_data/AL.txt

The following sed command will remove all the , at the end of each line.

In [56]:
%%bash
  sed -i 's/,$//' /content/sample_data/AL.txt

The last issue that needs to be dealt with is to include a line break for the last record in each file.  This will be needed when the files are merged together.

In [57]:
%%bash
  sed -i -e '$a\' /content/sample_data/AL.txt

Finally, before moving onto working with two files, let's remove all files in the /content/sample_data/ folder.

In [67]:
#Remove all files from the /content/sample_data/ folder
%%bash
rm /content/sample_data/*.*

## Working with Two Files

The following curl commands will call the Census Bureau API and obtain data for AL and AK.  Once again, the files (or data) being retrieved will be saved into two seperate text files.

In [68]:
#Using curl to retrieve data from Census Bureau API
%%bash
curl -o /content/sample_data/AL.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:01"
curl -o /content/sample_data/AK.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:02"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  8639    0  8639    0     0  11674      0 --:--:-- --:--:-- --:--:-- 11658100  123k    0  123k    0     0   147k      0 --:--:-- --:--:-- --:--:--  147k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  8639    0  8639    0     0  12874      0 --:--:-- --:--:-- --:--:-- 12855100 16748    0 16748    0     0  24959      0 --:--:-- --:--:-- --:--:-- 24922


Again, removing the header information from both files.  A wildcard, i.e. \*, is being used here when referring to the file to remove the header from.  This will remove the header information from **all** text files in this directory.

In [69]:
#Delete 1st line from all *.txt files
%%bash
 sed -i 1d /content/sample_data/*.txt

The following sed commands will remove all [ and ] characters from the data file.

In [70]:
%%bash
 sed -i 's/\[//g' /content/sample_data/*.txt
 sed -i 's/\]//g' /content/sample_data/*.txt

The following sed command will remove all the , at the end of each line.

In [71]:
%%bash
  sed -i 's/,$//' /content/sample_data/*.txt

Lastly, ensure that a line break exists for the last line in each file.

In [72]:
%%bash
  sed -i -e '$a\' /content/sample_data/*.txt

There are two files here - one file that contains information for AL and another that contains information for AK. The following command will **merge** (or concatenate) the contents of these two files.  The merged data is saved into a file called Both_States.txt.

In [73]:
%%bash
  cat /content/sample_data/*.txt > /content/sample_data/Both_States.txt

After the merge is complete, the following command can be used to insert a header row back into the file containing the data.

In [74]:
%%bash
  sed  -i '1i CensusTract_Name, Number_Workers, Percent_PublicTransportation,Percent_Bicycle,StateFIPS,CountyFIPS,CensusTractFIPS' /content/sample_data/Both_States.txt

Once again, before moving onto the gathering of data from all states, let's remove the contents of the /content/sample_data/ folder.

In [75]:
%%bash
rm /content/sample_data/*.*

## Gathering Data from All Census Tracts

Source: https://docs.google.com/spreadsheets/d/1NijveG6Z3H3_IOZDFF_FwLhwn5hDD2s319OMlVVAp9k/edit?usp=sharing

In [43]:
%%bash
curl -o /content/sample_data/AL.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:01"
curl -o /content/sample_data/AK.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:02"
curl -o /content/sample_data/AZ.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:04"
curl -o /content/sample_data/AR.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:05"
curl -o /content/sample_data/CA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:06"
curl -o /content/sample_data/CO.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:08"
curl -o /content/sample_data/CT.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:09"
curl -o /content/sample_data/DE.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:10"
curl -o /content/sample_data/FL.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:12"
curl -o /content/sample_data/GA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:13"
curl -o /content/sample_data/HI.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:15"
curl -o /content/sample_data/ID.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:16"
curl -o /content/sample_data/IL.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:17"
curl -o /content/sample_data/IN.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:18"
curl -o /content/sample_data/IA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:19"
curl -o /content/sample_data/KS.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:20"
curl -o /content/sample_data/KY.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:21"
curl -o /content/sample_data/LA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:22"
curl -o /content/sample_data/ME.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:23"
curl -o /content/sample_data/MD.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:24"
curl -o /content/sample_data/MA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:25"
curl -o /content/sample_data/MI.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:26"
curl -o /content/sample_data/MN.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:27"
curl -o /content/sample_data/MS.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:28"
curl -o /content/sample_data/MO.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:29"
curl -o /content/sample_data/MT.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:30"
curl -o /content/sample_data/NE.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:31"
curl -o /content/sample_data/NV.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:32"
curl -o /content/sample_data/NH.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:33"
curl -o /content/sample_data/NJ.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:34"
curl -o /content/sample_data/NM.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:35"
curl -o /content/sample_data/NY.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:36"
curl -o /content/sample_data/NC.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:37"
curl -o /content/sample_data/ND.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:38"
curl -o /content/sample_data/OH.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:39"
curl -o /content/sample_data/OK.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:40"
curl -o /content/sample_data/OR.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:41"
curl -o /content/sample_data/PA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:42"
curl -o /content/sample_data/RI.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:44"
curl -o /content/sample_data/SC.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:45"
curl -o /content/sample_data/SD.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:46"
curl -o /content/sample_data/TN.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:47"
curl -o /content/sample_data/TX.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:48"
curl -o /content/sample_data/UT.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:49"
curl -o /content/sample_data/VT.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:50"
curl -o /content/sample_data/VA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:51"
curl -o /content/sample_data/WA.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:53"
curl -o /content/sample_data/WV.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:54"
curl -o /content/sample_data/WI.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:55"
curl -o /content/sample_data/WY.txt "https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,S0801_C01_001E,S0801_C01_009E,S0801_C01_011E&for=tract:*&in=state:56"

Process is interrupted.


asdfa

In [44]:
%%bash
 sed -i 1d /content/sample_data/*.txt

In [45]:
%%bash
 sed -i 's/\[//g' /content/sample_data/*.txt
 sed -i 's/\]//g' /content/sample_data/*.txt

In [46]:
%%bash
  sed -i 's/,$//' /content/sample_data/*.txt

In [47]:
%%bash
  sed -i -e '$a\' /content/sample_data/*.txt

In [48]:
%%bash
  cat /content/sample_data/*.txt > /content/sample_data/All_States.txt

In [49]:
%%bash
  sed  -i '1i CensusTract_Name,Number_Workers,Percent_PublicTransportation,Percent_Bicycle,StateFIPS,CountyFIPS,CensusTractFIPS' /content/sample_data/All_States.txt

In [50]:
%%bash
ls -l /content/sample_data/

total 2576
-rw-r--r-- 1 root root   16131 Apr 28 05:55 AK.txt
-rw-r--r-- 1 root root 1314713 Apr 28 05:56 All_States.txt
-rw-r--r-- 1 root root  121793 Apr 28 05:55 AL.txt
-rw-r--r-- 1 root root   70811 Apr 28 05:55 AR.txt
-rw-r--r-- 1 root root  150708 Apr 28 05:55 AZ.txt
-rw-r--r-- 1 root root  831458 Apr 28 05:55 CA.txt
-rw-r--r-- 1 root root  123698 Apr 28 05:55 CO.txt


## Bringing data into Python

In [23]:
import pandas as pd

In [24]:

Commuter_All_CensusTracts = pd.read_csv("/content/sample_data/All_States.txt", dtype={'StateFIPS':str, 'CountyFIPS':str, 'CensusTractFIPS':str}) 

In [25]:
Commuter_All_CensusTracts.head(5)

Unnamed: 0,CensusTract_Name,Number_Workers,Percent_PublicTransportation,Percent_Bicycle,StateFIPS,CountyFIPS,CensusTractFIPS
0,"Census Tract 1, Aleutians East Borough, Alaska",2246,0.1,0.7,2,13,100
1,"Census Tract 1, Yukon-Koyukuk Census Area, Alaska",368,2.7,0.0,2,290,100
2,"Census Tract 2, Yukon-Koyukuk Census Area, Alaska",647,0.0,0.2,2,290,200
3,"Census Tract 3, Yukon-Koyukuk Census Area, Alaska",733,0.0,0.4,2,290,300
4,"Census Tract 4, Yukon-Koyukuk Census Area, Alaska",382,0.0,0.0,2,290,400


In [26]:
Commuter_All_CensusTracts.shape

(1614, 7)

In [27]:
pip install dfply

^C


In [28]:
from dfply import *

ModuleNotFoundError: ignored

In [None]:
Bicycle_Top10 = (
                 Commuter_All_CensusTracts 
                 #>> filter_by(X.StateFIPS == '27')     # Filter on StateFIPS=27 for MN
                 >> arrange(X.Percent_Bicycle, ascending = False)
                 >> head(10)
                 >> mutate(CensusTract_Link = 
                                 'https://censusreporter.org/profiles/14000US'
                                 + X.StateFIPS
                                 + X.CountyFIPS
                                 + X.CensusTractFIPS                           
                            )
                 
                 >> select(X.CensusTract_Name, X.Percent_Bicycle, X.CensusTract_Link)
               )

Bicycle_Top10

In [None]:
from IPython.display import HTML

In [None]:
HTML(Bicycle_Top10.to_html(render_links=True, escape=False))



---



---



---

