# Lab1: Introduction to Python for Geophysics

**Course:** UAF GEOS419 - Solid Earth Geophysics  
**Instructors**: Bryant Chow and Carl Tape  

**Acknowledgement:** 
This set of notebooks was originally developed by Calum Chamberlain, Finnigan Illsley-Kemp, John Townend and El Mestel at [Victoria University of Wellington-Te Herenga Waka](https://www.wgtn.ac.nz) for use by Earth Science graduate students. They have been modified slightly by Bryant Chow for use in GEOS419: Solid Earth Geophysics, an undergraduate geophysics class at [University of Alaska Fairbanks](https://www.uaf.edu/uaf/).

**Motivation:** 
The notebooks here cover material that we think will be of particular benefit to those students with little or no previous experience of computer-based data analysis. We presume very little background in command-line or code-based computing, and have compiled this material with an emphasis on general tasks that a geophysics student might encounter on a daily basis. 

## Notebook 1: But why programming?

1. What, why and how do we program?
   - Why programming?
   - An example of lots of data
   - Why Python?
   - Jupyter notebooks
2. Using Python on your own computer
3. Hello World!
------------

# What, why and how do we program?

## Why programming (what's wrong with Excel!?)

- **Reproducibility:** If someone can't replicate your 
  work, why should we trust it to be true?
- **Safety:** Your data and your processing should not
  overlap.  Your raw data should be sacred.
- **Speed:** You want a result, and you want it yesterday... Learn how to write good code 
    (and change the clock on your computer) and you can...
- **Complexity:** Being able to solve complex problems logically, in a way that others can follow
    (and reproduce) is essential to natural sciences. *Hint: Writing good code is as much about the*
    *quality of your documentation as it is about the quality of your code*.
- **Data scale:** Data in natural sciences is noisy, and large. Ideally to understand the natural world
    we would have data from every place at every time throughout the Earth. We don't have that, but
    our datasets are growing...
    


<img alt="XKCD 2180 spreadsheets
    [Cueball is at his computer. In the air on either side of him are an angel version of Cueball, with a halo and wings, and a devil version of Cueball, with horns and a pitchfork. The angel's dialogue appears in regular print, while the devil's dialogue appears in white print in black speech balloons.]
    Angel: Don't use a spreadsheet! Do it right.
    Devil: But a spreadsheet would be so easy.
    Angel: In the long run you'll regret it!
    [Closeup on Cueball, the angel, and the devil.]
    Angel: Take the time to write real code.
    Devil: Just paste the data! Tinker until it works!
    Devil: Build a labyrinth of REGEXREPLACE() and ARRAYFORMULA()!
    Devil: Feel the power!
    [Closeup on the devil.]
    Angel (off-panel): Fight the temptation!
    Devil: Ever tried QUERY() in Google Sheets? It lets you treat a block of cells like a database and run SQL queries on them.
    [Another shot of Cueball at his computer with the angel and devil at either side.]
    Angel: Don't listen to
    Angel: ...wait, really?
    Devil: Yes, and let me tell you about IMPORTHTML()...
    Angel: Oooh..." align="center" style="width:60%" src="https://imgs.xkcd.com/comics/spreadsheets.png">

## An example of lots of data

Let's consider what happens if we're dealing with data from a fairly standard seismological network. 

- long durations (multi-year);
- multiple locations;
- modest sampling rates.

For example: Seismic station [COLA](https://earthquake.alaska.edu/partnerstations/COLA) operating at UAF's College International Geophysical Observatory (CIGO) has been operating since 1996.

COLA's seismometer records at 100Hz (100 samples per second). How many samples per day does it record? First we need to know how many seconds there are per day.

In [None]:
seconds_per_day = 60 * 60 * 24
print(f"There are {seconds_per_day} seconds in a day")

How many samples per day?

In [None]:
sampling_rate = 100.0
samples_per_day = seconds_per_day * sampling_rate
print(f"COLA records {samples_per_day} samples per day")

So, >8.5 million samples per day.  But that is just for one channel: COLA has three channels, a vertical and two horizontals, so how many samples per day for one station?

In [None]:
number_of_channels = 3
samples_per_day_per_station = samples_per_day * number_of_channels
print(f"One station records {samples_per_day_per_station} samples per day")

Nearly 26 million samples per day! So how many samples is that over the lifetime of COLA since it was deployed in 1996?

In [None]:
days_per_year = 365.25  # Accounting for leap years
samples_per_year = days_per_year * samples_per_day_per_station
print(f"COLA records about {samples_per_year} samples per year.")
samples_since_start = samples_per_year * (2025 - 1996)
print(f"Since 1996 COLA has recorded {samples_since_start} samples, that's {samples_since_start:.2E} samples!")

274 **billion** samples, that's the same order of magnitude as stars in the Milky Way galaxy! Try working with that in a spreadsheet... and that is just **one** station in the Alaska Earthquake Center's network of instruments across the state (https://earthquake.alaska.edu/network).

Of course, this is just one example of a large dataset - and it's hard to imagine a situation in which a scientist needed to work with all 274 billion measurements in a completely unstructured way. However, the data from COLA gives an idea about the enourmous volumes of data that could be encountered in geophysics.

## Why Python?

In this course we will be using the Python programming language to help us learn how to automate tasks and analyze data in geophysics. Python is a relatively friendly language, but it still has lots of **rules** that you need to follow to make codes run. In these notebooks we will introduce some of those rules and start you on your way to 
[zen](https://www.python.org/dev/peps/pep-0020/).

So, why Python?
1. Open-source, community-driven (i.e., free!) software;
2. Simple syntax, fast to make mistakes and helpful error messages;
3. Community libraries to do lots of complex tasks 
   (e.g. [ObsPy](https://github.com/obspy/obspy/wiki) for seismology, [CartoPy](https://pypi.org/project/Cartopy/) for making maps and handling geographic projections, and [SciPy](https://www.scipy.org/) as an umbrella environment for computational science)

<img alt="xkcd 353 Python
    [A Cueball-like friend is talking to Cueball, who is floating in the sky.]
    Friend: You're flying! How?
    Cueball: Python!
    Cueball: I learned it last night! Everything is so simple!
    Cueball: Hello world is just print 'Hello, World!'
    Friend: I dunno... Dynamic typing? Whitespace?
    Cueball: Come join us! Programming is fun again! It's a whole new world up here!
    Friend: But how are you flying?
    Cueball: I just typed 'import antigravity'
    Friend: That's it?
    Cueball: ...I also sampled everything in the medicine cabinet for comparison.
    Cueball: But I think this is the python.
          " align="center" style="width:60%" src="https://imgs.xkcd.com/comics/python.png">

Python itself is a useful language in its own right, but one of *the best* things about Python is all the packages written to extend it. This means that you often don't have to write (much of) your own code! Most of the time someone out there knows better than you how to do something, so you get to use their code and focus on the important things.

Most Python packages (and all good ones) have documentation.  If you find yourself stuck, or thinking *I wish I could do this*, it is worth having a search online for what you want, or what you are stuck on. 

## What are Jupyter notebooks and how do we use them?

This is a Jupyter notebook! [Jupyter notebooks](https://jupyter.org/) provide inline interactive Python shells - i.e. an interface to entering and running real Python code - interspersed with explanations and other details that are formatted in something known as "markdown". 

Notebooks are increasingly used to document the actual code scientists are using to do their analysis alongside the interpretations and analysis. In fact there are now [some scientific papers have been written in Jupyter notebooks](https://github.com/jupyter/jupyter/wiki#a-gallery-of-interesting-jupyter-notebooks) which enables people to test their work directly. They are a great way to *show your work* while explaining what you did in more extensive prose. We are using them for teaching purposes because they let us play with the code and explain the ideas behind the code.

Notebooks like the ones we've prepared for these modules are designed to be used interactively in a web browser.  You should run through them, change some values, see what works, try and play
with variables and experiment.  There will be sections that you are expected to fill in
marked as **Exercise:**.  Please ask if and when you have problems.

In these notebooks we'll provide a brief introduction to Python for newcomers, providing you with the skills and understanding to complete labs and homeworks in this course. There are many other great tutorials out there for those who want to go beyound what we learn here:
- [The Python tutorial](https://docs.python.org/3/tutorial/)
- [LearnPython](https://www.learnpython.org/)
- [Data Carpentry](https://datacarpentry.org/lessons/) and [Software Carpentry](https://software-carpentry.org/) for data literacy and research computing skills

Let us know if you want to play around with any other data and we can try to accomodate you!

# Getting started: "Hello World!"

The first program written in most languages is a simple "Hello World!" program, that just outputs the phrase "Hello World!"
to the screen. In Python this is embarassingly simple (run the code by clicking the arrow button up the top, or by hitting
*ctrl-enter*):

In [None]:
print("Hello World!")

What we did is call the `print` function with the *argument* `"Hello World!"`. Encapsulating *Hello World!* in
quotes tells Python that we want this to be a *string* type. Strings hold characters, other types hold other
data types.

The `print` function takes whatever we gave it as an argument and prints that to screen (we see the output of our
code in Jupyter notebooks just beneath the *cell* that we ran the code in).

In the next notebook, we'll look at different data types and start playing with real data.