# **Lab 9 — Downloading files from URLs**
---

## Introduction

In this lab we will practice downloading files from URLs online and examining their structure using some common Unix command-line tools.

Your deliverable for this lab will be this notebook, with answers to questions completed as requested below. Please rename the notebook from `lab_09.ipynb` to `<last_name>_lab_09.ipynb` prior to submission. Download the completed file by checking its box in the JupyterHub file browser and selecting **Download** from the menu that appears. Submit it to Canvas under the Lab 9 assignment.

Start up a new terminal window to use for this lab from JupyterLab via **File $\rightarrow$ New $\rightarrow$ Terminal**.

Make an empty directory inside of your GEOS636_PAG directory to work in for this lab:

`cd ~/GEOS636_PAG`  
`mkdir lab_09`  
`cd lab_09`

## Exercise I: Introducing `curl` and `wget`

You may have noticed the `curl` command in previous labs, where we've used it to download simple text files from GitHub or other online sources. An excellent introduction to downloading files with `curl` can be found [here](http://www.compciv.org/recipes/cli/downloading-with-curl/). `wget` is a similar command that can also be used to download files — a guide can be found [here](https://www.pair.com/support/kb/paircloud-downloading-files-with-wget/).

The commands have very similar syntax, illustrated below. Try these commands in the terminal:
```text
curl --remote-name https://raw.githubusercontent.com/uafgeoteach/GEOS636_PAG/master/labs/lab_08/file1
cat file1

wget https://raw.githubusercontent.com/uafgeoteach/GEOS636_PAG/master/labs/lab_08/file1
cat file1
```
Each command downloads [`file1`](https://raw.githubusercontent.com/uafgeoteach/GEOS636_PAG/master/labs/lab_08/file1) (follow the link to see the file's location online and its contents) and stores it with the original remote filename (i.e., `file1`). Note that using the remote name is the default for `wget`.

Of course, you could also navigate to the online file in your browser, and save it to your computer manually. For example, on Google Chrome with macOS I can right-click and select **Save As...** to save the file. However, then it would not end up on the OpenSARlab server where you would want to run your analysis, but on your machine at home. In addition, `curl` and `wget` are extremely useful for **automating** this process.

## Deliverable 1  <font color='red'>(25 points)</font>

The following is designed to help you understand various options for `curl`. For each of the three commands below, run them in the terminal and, in a **new text cell** below, describe what happens.

```text
curl https://raw.githubusercontent.com/uafgeoteach/GEOS636_PAG/master/labs/lab_08/file2

curl --remote-name https://raw.githubusercontent.com/uafgeoteach/GEOS636_PAG/master/labs/lab_08/file2

curl --output myfile2 https://raw.githubusercontent.com/uafgeoteach/GEOS636_PAG/master/labs/lab_08/file2
```

> **Note:** If you don't want to keep copy/pasting the long URL, you can assign it to a variable!

## Exercise II: A more complicated example

Click on this URL, or copy/paste into your browser:  
[https://service.iris.edu/irisws/fedcatalog/1/query?network=AV&channel=BHZ&startbefore=2020-01-01&format=text&level=station](http://service.iris.edu/irisws/fedcatalog/1/query?network=AV&channel=BHZ&startbefore=2020-01-01&format=text&level=station)

The content that you see is the result of a data request. The parameters for the request are included in the body of the URL itself. In this case, you're looking at metadata for seismometers belonging to the Alaska Volcano Observatory (AVO). In particular, the request portion of the URL has the following components:

* `network=AV` — select the AVO seismic network, which has code "AV"
* `channel=BHZ` — this selects only vertical-component broadband seismometer channels, which have code "BHZ" ("Z" for vertical)
* `startbefore=2020-01-01` — only select stations which were operational before the beginning of 2020
* `format=text` — print in an easy-to-read format
* `level=station` — specifies to only include station information, rather than more detailed information

This URL-based request method is a common way to access data from online sources. Try modifying the URL in your browser with a different `startbefore` date and see how the output changes. You can also change the `network` to `AK` to view all of the corresponding stations belonging to the Alaska Earthquake Center, or `AT` to view National Tsunami Warning Center stations. Change `level` to `network` to view just the network-level information.

## Deliverable 2  <font color='red'>(25 points)</font>

Using `curl` or `wget` along with other tools for examining text files, answer the following questions. Run your commands in the terminal, and **paste them into a new code cell below, along with your answers**.

**(a)** How many AV BHZ stations were operational before the beginning of 2020?  
**(b)** Which AV BHZ station(s) were operational before November 2005? Where in Alaska are they located?  
**(c)** How many total stations are there in the AT network?

> **Hints:**
> * You may find it necessary to enclose your URL in quotes (`'` or `"`).
> * For **(a)**, `wc -l` might be helpful. Just make sure to account for blank lines / headers at the start / end of the file — `head` and `tail` will be useful here! You may check your answer by opening your output file in the JupyterHub text editor, which offers line numbering...
> * For **(c)**, all you need to provide is `network`, `format`, and `level`. Use `level=network` and examine the output.

## Deliverable 3  <font color='red'>(25 points)</font>

For this final deliverable, we'd like you to put it all together by downloading your own dataset from a URL. Review the list of public data archives found on slide 15 of `lecture_09_unix_getting_data.pdf` on Canvas. Select one of these archives — or use your own alternative website — and:

* Download some data using `wget` or `curl`
* Inspect the data using a Unix command-line tool of your choice
* Comment on the structure of the data

You should run your commands in the terminal, but to answer this question please paste your commands for downloading and inspecting, and your commentary, into a **new code cell below**. (You don't need to provide the file that you downloaded, since we will be able to use your commands to obtain the file ourselves.)