# Integrating Colab with Google Drive
If we are going to be grabbing data from the internet for use with colaboratory, we'll need to use our google drive


In [30]:
from google.colab import drive
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## The unix terminal
Unix is one of the oldest and most barebones Operating Systems (OS), but it serves as the platform for more advanced OSs like Mac OS and some modes of Windows.

Unix and unix-based OSs like Mac OS, have a powerful utility called the "Terminal", that allows for a "command line interface" for interacting with your computer (Windows has something similar, but the syntax is different).

Jupyter allows us to run some of these commands as **if we were in a terminal**:

In [0]:
ls

[0m[01;34mdrive[0m/  [01;34msample_data[0m/


In [0]:
pwd

'/content'

In [0]:
ls "drive/My Drive/dsps"

interactingWDrive.ipynb


In [0]:
ls -ltr "drive/My Drive/dsps"

total 5
-rw------- 1 root root 4508 Sep 27 00:43 interactingWDrive.ipynb


These unix commands can also be combined with some python syntax using the os module (let's make a directory called "data" in your colaboratory home directory):

download data from the web

to do that you can use wget, but that command is not "included" in the colab jupyter notebook. the are unix commands. you can use them if you start the command with !



In [0]:
cd "drive/My Drive/dsps"

/content/drive/My Drive/dsps


In [0]:
!wget https://www2.census.gov/programs-surveys/cbp/datasets/2016/zbp16totals.zip


--2019-09-27 00:46:52--  https://www2.census.gov/programs-surveys/cbp/datasets/2016/zbp16totals.zip
Resolving www2.census.gov (www2.census.gov)... 23.46.200.113, 2600:1402:3800:29d::208c, 2600:1402:3800:29f::208c
Connecting to www2.census.gov (www2.census.gov)|23.46.200.113|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘zbp16totals.zip’

zbp16totals.zip         [ <=>                ] 750.30K  --.-KB/s    in 0.1s    

2019-09-27 00:46:55 (6.29 MB/s) - ‘zbp16totals.zip’ saved [768303]



In [0]:
ls

interactingWDrive.ipynb  zbp16totals.zip


same with ```unzip``` to expand a compressed zip file. the argument -d <path>expands into "path"



In [0]:
!unzip zbp16totals.zip -d "./"

Archive:  zbp16totals.zip
  inflating: ./zbp16totals.txt       


In [0]:
ls

interactingWDrive.ipynb  zbp16totals.txt  zbp16totals.zip


Another way to access unix commands is to use the module os and specifically the function os.system(). This function performs operations and returns a success vs error state. So I cannot use it, for example, ro tun ls



In [0]:
import os
cmd = "ls ./"
os.system(cmd)

0

0 means no error: the command was executed without problem. Any other number will indicate an error (and you can google the meaning of the number.) But i do not see the output of the command. so use this if you want to know if the command worked, but you do not care to see its output.

For example I can use it to remove files and directories: rm (rm -r which means recursively, for a directory)

Note the syntax: you need to parse the space with a "\ " in front of it to be rendered correctly in the string when you use os.system()

In [0]:
cmd = "mkdir test_dir"
os.system(cmd)

0

In [0]:
ls

interactingWDrive.ipynb  [0m[01;34mtest_dir[0m/  zbp16totals.txt  zbp16totals.zip


In [0]:
cmd = "rm -r test_dir"
os.system(cmd)

0

In [0]:
ls "test_dir"

ls: cannot access 'test_dir': No such file or directory


You can string unix commands together with the semicolon ; (cd stands for "change directory" and it moves you around)

In [0]:
ls; pwd

interactingWDrive.ipynb  zbp16totals.txt  zbp16totals.zip
/content/drive/My Drive/dsps


But again, **these are not python commands and can't be used as such on their own**:

We'll see how combining unix commands with python via os.system can be put to good use below.


### The wget command
When dealing with data on the web, the wget command can be very useful for quickly grabbing data that you want. Let's get the zipcode business patterns for 2016 (note the exclamation point before the wget command):

In [0]:
!wget https://www2.census.gov/programs-surveys/cbp/datasets/2016/zbp16detail.zip


--2019-09-27 01:03:06--  https://www2.census.gov/programs-surveys/cbp/datasets/2016/zbp16detail.zip
Resolving www2.census.gov (www2.census.gov)... 23.46.200.113
Connecting to www2.census.gov (www2.census.gov)|23.46.200.113|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘zbp16detail.zip’

zbp16detail.zip         [         <=>        ]  27.51M  14.9MB/s    in 1.8s    

2019-09-27 01:04:11 (14.9 MB/s) - ‘zbp16detail.zip’ saved [28843251]



In [31]:
ls

interactingWDrive.ipynb  zbp16detail.zip  zbp16totals.txt  zbp16totals.zip


Now we can see the utility of combining unix commands with python to make our lives much easier:

In [33]:
# -- define a command to make a sub-directory in data called zbp
cmd = "mkdir zbp"
# -- run that command
os.system(cmd)
# -- check if the command worked
!ls

interactingWDrive.ipynb  zbp  zbp16detail.zip  zbp16totals.txt	zbp16totals.zip


In [34]:
ls

interactingWDrive.ipynb  zbp16detail.zip  zbp16totals.zip
[0m[01;34mzbp[0m/                     zbp16totals.txt


last package to discuss: glob looks at the content of a directory and returns it in a list. if the dir is empty the length of the string is...0!



In [47]:
import glob
if len(glob.glob("/content/drive/My Drive/dsps"))==0:
    os.system("mkdir /content/drive/My\ Drive/dsps")

glob.glob("/content/drive/My Drive/dsps")

['/content/drive/My Drive/dsps']

## TASK
Create a for loop that reads in the ZIPCODE data zbp[year]detail.zip where [year] goes from 2010 through 2014 included. The for look should use the os module to:

- download the zipped file
- unzip the file in dsps2019 dir (create it if necessary)
- clean up after itself by removing the downloaded zip files
- print at every step of the loop which year the loop is working on and at each command which command is running.
- notify the user if an error occurs and where

In [68]:
path = "https://www2.census.gov/programs-surveys/cbp/datasets"
ii = 16
snum = str(ii)

# -- build the wget command
cmd_wget = "wget {}/20{}/zbp{}detail.zip".format(path, ii, ii)
cmd_wget

'wget https://www2.census.gov/programs-surveys/cbp/datasets/2016/zbp16detail.zip'

In [69]:
path = "https://www2.census.gov/programs-surveys/cbp/datasets"

# create the directory as above. 
# Use glob.glob to see if it is there and capture the output of os.system to see if you have an error
if len(glob.glob("/content/drive/My Drive/dsps/zbp"))==0:
    res = os.system("mkdir /content/drive/My\ Drive/dsps/zbp")
    if res!=0:
        print("err", res, "with mkdir")

# -- loop through the years for which we want the zbp data
for ii in range(10, 15): 
  
    # -- convert year to string
    snum = str(ii)

    # -- build the wget command
    cmd_wget = "wget {}/20{}/zbp{}detail.zip".format(path, snum, snum)

    # -- build the unzip command
    cmd_unzip = "unzip zbp{}detail.zip -d /content/drive/My\ Drive/dsps/zbp/".format(snum)

    # -- build the remove command
    cmd_rm = "rm zbp{}detail.zip".format(snum)
  
    # -- alert on progress
    print("downloading year 20{}".format(snum))
  
    # -- execute the wget command and check
    res = os.system(cmd_wget)
    if res != 0:
        print("error ", res, "with downloading", ii)
  
    # -- alert on progress
    print("moving file...")

    res = os.system(cmd_unzip)
    
    # -- alert on progress
    print("remove the zip file")

    res = os.system(cmd_rm)


downloading year 2010
moving file...
remove the zip file
downloading year 2011
moving file...
remove the zip file
downloading year 2012
moving file...
remove the zip file
downloading year 2013
moving file...
remove the zip file
downloading year 2014
moving file...
remove the zip file


In [70]:
ls "/content/drive/My Drive/dsps/zbp"

zbp10detail.txt  zbp12detail.txt  zbp14detail.txt
zbp11detail.txt  zbp13detail.txt
