# Intro to Python

### START ON COMMAND LINE

Check you're running Python 3:

In [1]:
!python -V

Python 3.7.1


In [2]:
3 + 2

5

In [3]:
a = 3

In [4]:
a + 2

5

In [5]:
import math
math.sqrt(10)

3.1622776601683795

## Getting help

In [6]:
math.__doc__

'This module is always available.  It provides access to the\nmathematical functions defined by the C standard.'

In [None]:
help(math)

In [None]:
dir(math)

In [None]:
dir(10)

We get some extra help from IPython:

In [None]:
math.  # Put your cursor after the dot and hit tab.

In [None]:
math.sqrt?

### Let's continue!

## Numbers and `math`

- Numbers and math operators, including modulo
- `int` and `float` (without talking about types per se)
- Assignment and dynamic typing
- Booleans and boolean operators
- `math` module
- `dir` and `help` etc.

### Exercise

- Evaluate $x^2 + 3x - 7$ when $x = 5$. (You should get 33.)
- What is the log<sub>10</sub> of 7e7? (Use the `math` library.)
- What is the $tan$ of $\pi$ rad? (Use the `math` library.)

In [None]:
# Part 1 – mathematical operators


In [None]:
# Part 2 – scientific notion and using a function from the math library


In [None]:
# Part 3 – trigonometric calculation


In [None]:
x = 5
x**2 + 3*x - 7

In [None]:
import math
math.log10(7e7)

In [None]:
math.tan(math.pi)

## Sneak peek at NumPy

In [None]:
import numpy as np

np.log(7)

In [None]:
before = np.load('../data/st-helens_before.npy')

before

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
plt.imshow(before)

## Strings

#### SWITCH TO NOTEBOOK

- Strings
- `len` and sequences
- Membership: `in`
- Concatenation and multiplication
- `str` and type casting (strong typing)
- Indexing, slicing
- String methods (objects, methods, functions)
- `upper`, `isupper`, `startswith`, `find`, `replace`
- Print, `\n` and escapes
- f-strings and `str.format()`

In [None]:
print("Hello world")

In [None]:
s = "Sandstone"

In [None]:
len(s)

In [None]:
'd' in s

In [None]:
# What happens if we try to add strings together?
'The porosity of ' + s + 'is :' + 0.28

In [None]:
porosity = 0.28
str(porosity)

In [None]:
'The porosity of ' + s + 'is :' + str(porosity * 100) + '%'

In [None]:
type(s)

In [None]:
# indexing a slicing
# s[start:stop:stride]
s[3]
s[0]
s[-1]
s[::-2]

In [5]:
# Methods.
s.lower()

NameError: name 's' is not defined

In [None]:
a = 'Sandstone is a type of stone.'
a.find('stone')

In [None]:
a = '123,456 m'
print(a.replace(',', '\t'))

In [None]:
s.replace('sand', 'silt').title()

In [None]:
# formatting strings 2 ways.
n = 0.1 + 0.2

print('n is {:.3f}'.format(n:.3f))

print(f'n is {n:.2f}')

In [None]:
# A note about escape characters and unicode
print(u'It\'s time for \U0001F355')

### Exercise

- What types are:
    - `'Jurassic'`
    - `5`
    - `3.1416`
    - `s`
    - `math`
    - `math.log`
    - `str`
- Use the `math` module and an f-string to print $\mathrm{e}$ (the base of the natural logarithm) to 3 decimal places. 
- Change `'JURASSIC*PERIOD\n'` to lower case.
- Change the `'*'` to a space, change everything to title case, and remove the new line.
- For a bonus point, make sure your expression also works for `'CARBONIFEROUS*PERIOD\n'`
- For another bonus point, do this all in a single expression.

In [None]:
# What types are: 'Jurassic', 5, 3.14, s, math, math.log, and str ?








In [None]:
# print e in an f-string with 3 decimal places


In [None]:
# Change the following string to lower case
s = 'JURASSIC*PERIOD\n'


In [None]:
# Change the '*' to a space, change everything to title case, and remove the new line character


In [None]:
# Make sure the expression above also works for 'CARBONIFEROUS*PERIOD\n'
s2 = 'CARBONIFEROUS*PERIOD\n'


In [None]:
# Bonus point: do this all in a single expression (one line)


## Another sneak peek at NumPy

In [None]:
np.max(before)

In [None]:
before.max()

In [None]:
np.mean(before)

In [None]:
before.mean()

## `if ... else`

- White space in Python
- Basic pattern
- `elif` for mutually exclusive options, e.g. when parsing a file, or comparing a number to various ranges.
- One-liner: `lithology = "sand" if GR < 50 else "shale"`

In [None]:
string = 'UWI 6/09239/45-0001'

if '/' in string:
    print('Valid')

In [None]:
# Build up example
if string.startswith('UWI') and ('/' in string):
    print('Valid')
else:
    print('Not valid')
print('Done')

In [None]:
string = '2340 m'  # or 12300 ft

if 'm' in string:
    units = 'M'
elif 'f' in string:
    units = 'FT'
else:
    units = None
    
print(units)

This diagram might help make sense of `if` (then again it might not!). In particular, it shows how the different branches `if` and `else`, or the different branches of `if`, `elif`, and `else`, are *mutually exclusive*. There is no scenario where more than one 'branch' runs.

<img src="../images/python_if_elif_else.png">

## Lists

- Split `'Triassic Jurassic Cretaceous'`
- "Just another sequence"
- Numbers in a list, plot them
- Making a list, syntax: parentheses, brackets
- Heterogeneous lists, nested lists
- Indexing and slicing (brackets when slicing)
- `append`, `index`, `pop`
- Mutability, compared to strings, `append`
- Mutability gotcha

In [None]:
periods = 'Triassic Jurassic Cretaceous'
list_of_periods = periods.split()

In [None]:
# Numbers in a list, plot them.
vp = [2350, 2400, 2630, 2650, 2700, 2660]


In [None]:
# Making a list with [].


In [None]:
# Heterogeneous lists and nested lists.
complex_list = [0, 1, 2.5, math.pi, list_of_periods, 'STRINGS', [100,200,300]]

In [None]:
# Indexing and slicing (square brackets when slicing).


In [None]:
# List methods: append, index, pop.


In [None]:
# Mutability compared to strings, `append`.
s[5] = 'S'
# compared to 
s.replace('s','S')
#

In [None]:
# Deep copy.
# Don't get into this unless it comes up.

a = [1, 2, 3, 4, 5]
b = [100, 200, 300, a]
c = b.copy()
c[-1][0] = 1000
print(a)  # Got mutated.

from copy import deepcopy
x = [1, 2, 3, 4, 5]
y = [100, 200, 300, x]
d = deepcopy(y)
d[-1][0] = 1000
print(x)  # Deep copy is not mutated.

### Exercise

- Split this string into a list called `lithologies`: `'Sandstone, Shale, Limestone, Dolomite, Basalt, Granite'`.
- Use `sorted()` to sort the list. Can you sort it backwards? 
- Copy the list to a new name, `rocks`.
- Add the following rocks to the new list: `'Gypsum'`, `'Halite'` (and make sure they're not in the old one).
- Change the second element to `'Mudstone'`.

In [None]:
# Split this string into a list called 'lithologies'
str_with_rocks = 'Sandstone, Shale, Limestone, Dolomite, Basalt, Granite'


In [None]:
# Use sorted() to sort this list. Can you sort it backwards?


In [None]:
# Copy the list to a new name called `rocks`


In [None]:
# Add the following rocks to this new list: 'Gypsum', 'Halite'
# Also make sure they are not in the old one



In [None]:
# Change the second element in `rocks` to 'Mudstone'


## More NumPy

In [None]:
before

Slicing into arrays:

Math with a list, compared to arrays:

In [None]:
after = np.load('../data/st-helens_after.npy')

## `for ... in ...` (for each)

- Basic pattern on a list
- Basic pattern on a string
- Print 'sand' or 'shale' for porosities `[0.03, 0.01, 0.19, 0.12, 0.21, 0.05, 17.5, 3.0]`
- If we've covered `dict`, then mention stepping over `dict.items()`
- List comprehension &mdash; with strings
- `continue` and `break`

In [None]:
porosities = [0.03, 0.01, 0.19, 0.12, 0.21, 0.05, 17.5, 3.0]

In [None]:
# Recall we have our list of rocks from before:
rocks

In [None]:
for rock in rocks:
    print(rock.upper())

In [None]:
regular = []
for porosity in porosities:
    
    if porosity > 1:
        porosity /= 100
    print(regular)

    regular.append(porosity)
    
regular

In [None]:
[rock.upper() for rock in rocks]

In [None]:
# for-else
# Don't get into this unless it comes up.

for n in [0, 1, 2, 3, 4, 5, 6]:
    if n == 5:
        break
    elif n > 2:
        print(n)
    else:
        continue
else:
    print('Loop finished normally')
    
# Note if we break the loop (in the if), we could
# do 'something special' before breaking.
# But you can't do something special when *not*
# breaking... except with the else.

### Exercise

Rearrange the following lines to loop over a list of files and gather the second part of the file names &mdash; the months &mdash; into a new list, then print the new list.

When the code runs, it should produce:

    ['Jan', 'Mar', 'Jun', 'Jun']


In [None]:
print(months)
files = ['MH_Jan-18.png', 'MH_Mar-18.png', 'MH_Jun-18.png', 'MH_Jun-17.png']
months.append(month)
month = file.split('_')[1].split('-')[0]
for file in files:
months = []

In [None]:
files = ['MH_Jan-18.png', 'MH_Mar-18.png', 'MH_Jun-18.png', 'MH_Jun-17.png']
months = []
for file in files:
    month = file.split('_')[1].split('-')[0]
    months.append(month)
print(months)

### Exercise

- Make a loop to print the squares of the numbers up to 10. Start with `for n in range(1, 11):`
- Modify your loop to capture the squares in a new list, instead of printing them.
- For a bonus point, do this with a list comprehension instead.
- Add an `if` to only collect the squares of even numbers (to the `for` loop version; it's a bit harder with the list comprehension).
- Loop over the following list and make a new list of 1000 times each value. However, if the value is -999.25, skip it:
    - `[2.3, 2.4, 2.8, -999.25, 2.1, -999.25, 2.5, 2.5]`

In [None]:
squares = []
for n in range(1, 11):
    if n % 2 == 0:
        squares.append(n**2)
squares

In [None]:
[n**2 for n in range(1, 11) if n % 2 == 0]
[1000*d for d in data if d != -999.25]

Again, this diagram might help... but it might not.

<img src="../images/python_for.png">

## Dictionaries

- Not a sequence, but a database-like mapping of k, v pairs
- Forming with `{...}`
- Keys and values
- Retrieving a value using a key
- Adding keys
- Deleting keys with `pop`
- `in`
- `get` (a bit like a database)
- `update` with `{k: v}`, or just do `dict[k] = v`

In [None]:
log = {'mnemonic': 'GR',
       'mean': 65.0,
       'tool': 'NGT' }

In [None]:
mineral = {'name': 'quartz',
           'colour': 'none',
           'formula': 'SiO',}

# Add hardness, streak, etc.

In [None]:
# Dictionary update

# This is worded pretty weirdly:
mineral.update?

# Let's see the various ways you can do it.

In [None]:
# Update with iterable of 2-tuples:
D = {'name': 'Sandstone', 'density': 2340}
E = [('velocity', 2200), ('phi', 0.19)]
D.update(E)
D

In [None]:
# Update with another dict:
D = {'name': 'Sandstone', 'density': 2340}
F = {'velocity': 2200, 'phi': 0.19}
D.update(F)
D

In [None]:
# Update with keyword arguments:
D = {'name': 'Sandstone', 'density': 2340}
D.update(velocity=2200, phi=0.19)
D

In [None]:
# Updating with keywords is the same as dict unpacking F:
D = {'name': 'Sandstone', 'density': 2340}
D.update(**F)
D

### Exercise

Using the dictionary called `periods`, complete the following tasks:

- Retrieve the start of the Jurassic.
- Use this call in an f-string to print:
      'The Jurassic started about 201 Ma ago.'
- Add the Permian, starting at 298.9 Ma.
- The Quaternary has the wrong age; change it to 2.58 Ma.
- The Palaeogene has the wrong spelling; change it.
- Get a sorted list of the start ages (note, you don't need to sort the entire dictionary, which is a little tricky; you just need to sort the ages).
- Stretch goal: Make a new dictionary with values that also contain the uncertainty in the ages (you can make up the uncertainties).
- Stretch goal: Loop over the dictionary, printing the sentence in the second question (above) for each one.

In [None]:
periods = {
    'Triassic': 251.9,
    'Jurassic': 201.3,
    'Cretaceous': 145.0,
    'Palogene': 66.0,
    'Neogene': 23.03,
    'Quaternary': 2.18,
}

In [None]:
# Retrieve the start of the Jurassic.


In [None]:
# Use this in an f-string to print, 'The Jurassic started about 201 Ma ago.'


In [None]:
# Add the Permian period to the dictionary, which started at 298.9 Ma


In [None]:
# The Quaternary has the wrong age; change it to 2.58 Ma.


In [None]:
# The Palaeogene has the wrong spelling; change it.


In [None]:
# Get a sorted list of the start ages.


In [None]:
# Make a new dictionary with values that also contain an uncertainty in ages


In [None]:
# Loop over the dictionary, printing the sentence in the second question (above) for each item


In [2]:
periods['Jurassic']
# Or periods.get('Jurassic')

NameError: name 'periods' is not defined

In [None]:
f"The Jurassic started about {periods.get('Jurassic'):.0f} Ma ago."

In [None]:
periods['Permian'] = 298.9

In [None]:
periods['Quaternary'] = 2.58

In [None]:
periods['Palaeogene'] = periods.pop['Paleogene']

In [None]:
sorted(periods.values())

In [None]:
periods = {
    'Triassic': (251.9, 0.24),
    'Jurassic': (201.3, 0.9),
    'Cretaceous': (145.0, 1),
    'Palaeogene': (65, 0.1),
    'Neogene': (23.03, 0.3),
    'Quaternary': (2.58, 0.001),
}

In [None]:
# This dict comprehension builds a dictionary version of the above:
{k:{'start': s, 'uncert': u} for k, (s, u) in periods.items()}

In [None]:
# Advanced learners often ask about sorting the whole dictionary on its values.
# You can do this in a loop (see below), but here's a quick dict comp that does it.
# Warning: it contains three things which beginners definitely do not need to know:
# a dict comprehension, lambdas, and `sorted`'s `key` argument, none of which
# is a topic for beginners.
{k: v for k, v in sorted(periods.items(), key=lambda x: x[1])}

In [None]:
# Here's a loop you can build up from scratch if you really get into it.
sorted_dict = {}
for age in sorted(periods.values()):
    # print(age)
    for item in periods.items():
        if item[1] == age:
            # print(item)
            sorted_dict[item[0]] = age
            
sorted_dict

### Optional exercise: counting with a dictionary

Dictionaries are useful for counting things. Can you loop over the list of strings `strat` and make a dictionary that maps each unique layer type to the count of layers for that type?

You should end up with a dictionary that looks similar to:

    {'marl': 169,
     'sandstone': 178,
     'shale': 169,
     'dolomite': 168,
     'limestone': 145,
     'siltstone': 171}

To do this, we'll generate some fake data:.

We can start with a small number of 'layers', to make sure our code is working. Then we can process any number of layers.

In [3]:
import random

rocks = ['marl', 'sandstone', 'shale', 'dolomite', 'limestone', 'siltstone']
strat = [random.choice(rocks) for _ in range(10)]
strat

['marl',
 'sandstone',
 'siltstone',
 'limestone',
 'limestone',
 'sandstone',
 'siltstone',
 'shale',
 'limestone',
 'marl']

In [4]:
# Your code here.


In [None]:
counts = {}
for layer in strat:
    if layer in counts:
        counts[layer] += 1
    else:
        counts[layer] = 1

counts

In [None]:
counts = {}
for layer in strat:
    counts[layer] = counts.get(layer, 0) + 1
    
counts

In [None]:
# Using collections.defaultdict
from collections import defaultdict
counts = defaultdict(int)
for layer in strat:
    counts[layer] += 1
    
counts

In [None]:
# Using collections.Counter
from collections import Counter

counts = Counter(strat)

counts

### Exercise: collecting

Let's say the layers are 1 m thick. We would like a dictionary that provides the tops of all the reservoir units, sandstone and dolomite, the tops of seals, and the tops of everything else. The result should be a dictionary that looks like this:



In [None]:
reservoirs_and_seals = {'reservoir': [],
                        'seal': [],
                        'other': []
                       }
for depth, layer in enumerate(strat):
    if layer in ['sandstone', 'dolomite']:
        reservoirs_and_seals['reservoir'].append(depth)
    elif layer in ['shale', 'marl']:
        reservoirs_and_seals['seal'].append(depth)
    else:
        reservoirs_and_seals['other'].append(depth)

reservoirs_and_seals

In [None]:
# Use defaultdict and eliminate the switch:
reservoirs_and_seals = defaultdict(list)

layer_types = {
    'sandstone': 'reservoir',
    'dolomite': 'reservoir',
    'shale': 'seal',
    'marl': 'seal',
}

for depth, layer in enumerate(strat):
    layer_type = layer_types.get(layer, 'other')
    reservoirs_and_seals[layer_type].append(depth)

reservoirs_and_seals

In [None]:
# Could define dictionary in more compact way then reorganize it.
layer_types_compact = {
    'reservoir': ['sandstone', 'dolomite'],
    'seal': ['shale', 'marl'],
}
for layer_type, layer_names in layer_types_compact.items():
    for layer_name in layer_names:
        layer_types[layer_name] = layer_type
        
# Then do as before.
reservoirs_and_seals = defaultdict(list)
for depth, layer in enumerate(strat):
    layer_type = layer_types.get(layer, 'other')
    reservoirs_and_seals[layer_type].append(depth)

reservoirs_and_seals

In [None]:
# Using itertools.groupby()
from itertools import groupby

{k: list(v) for k, v in dict(groupby(strat, key=lambda x:layer_types.get(x, 'other'))).items()}

In [None]:
for n, g in groupby(strat, key=lambda x:layer_types.get(x, 'other')):
    print(n, g)

---

## Other great places to pick up Python:

- [Learn X in Y minutes](https://learnxinyminutes.com/docs/python3/) — If you just want to get cracking.
- [Stavros](https://www.stavros.io/tutorials/python/) — If you want to know a bit more.
- [Robert Johansson's lectures](Lecture-1-Introduction-to-Python-Programming.ipynb)
- [Tutorials Point](http://www.tutorialspoint.com/python/python_quick_guide.htm) — Another option.
- [Code Academy](https://www.codecademy.com/learn/learn-python-3) — A more sedate pace.
- [Udacity Intro to Computer Science](https://www.udacity.com/course/intro-to-computer-science--cs101) — Fantastic but a serious undertaking.
- [All the tutorials!](https://wiki.python.org/moin/BeginnersGuide/Programmers)

**WARNING** There's still a lot of Python 2 around. Keep away from it if you can! Python 3 has lots of advantages, and there are hardly any libraries now that have not made the swtich.

----

## Python is...

- Not just a scripting language.
- Interpreted, not compiled.
- Strongly typed — types are enforced.
- Dynamically, implicitly typed — you don't have to declare variables.
- Case sensitive — var and VAR are two different variables.
- Object-oriented — everything is an object.
- Supportive of functional and procedural styles.

In [None]:
import this

## Nice Python

- Read [PEP8](https://www.python.org/dev/peps/pep-0008/).
- Use an IDE or use a linter in your text editor.
- Write docstrings: think about your users and colleagues and your future self!
- Write code that doesn't need a lot of inline comments.


<hr />

<div>
<img src="https://avatars1.githubusercontent.com/u/1692321?s=50"><p style="text-align:center">© Agile Geoscience 2018</p>
</div>