# Introduction to Python

This tutorial is intended as a crash course into Python for absolute beginners, which means for people who have never before written a line of code. And the combination of Python and Jupyter makes it particulary easy to get started with right away. The tutorial covers the basics plus some useful advanced features that are useful for using Jupyter for data science, but it does not cover everything, like objects, lambdas, templates, etc.

## Variables and Types

### Primitive Types

Primitive types are the most basic data types available. In Python, there are four, integers, floating-point values, booleans, and strings.

#### Integer Numbers

Integers or ints are the numbers we find on our keyboard and can be positive as well as negative.

In [1]:
1

1

In [2]:
2+2

4

In [3]:
30 - 5*5

5

In [4]:
30 - 5*(2+3)

5

In [5]:
2**3

8

In [6]:
9/3

3.0

#### Floating-point Numbers

Floats-point numbers or floats are all real-values numbers.

In [7]:
0.001

0.001

In [8]:
10.0**16

1e+16

In [9]:
1.e-3

0.001

In [10]:
1.e-16

1e-16

In [11]:
1.e3

1000.0

In [12]:
1.e16

1e+16

If we divide two integer values, we will obtain a float.

In [13]:
10/3

3.3333333333333335

We can force the return value to be an integer by using `//` which will round the return value to the nearest inetger that is smaller than the return value.

In [14]:
10//3

3

In [15]:
10%3

1

A computer can represent real-valued numbers not exactly but must use rounding. While individual rounding errors may be small, there are situations where they can add up and cause serious trouble.

In [16]:
0.1 + 0.2

0.30000000000000004

#### Boolean Expressions

Boolean expressions are expressions that result in a boolean value or bool which takes either value `True` or `False`. They are quite self-explanatory in Python syntax.

In [17]:
4 > 3

True

In [18]:
4 == 3

False

In [19]:
4 != 3

True

In [20]:
4 >= 3

True

In [484]:
4 >= 3 and 4>5

False

In [483]:
4 >= 3 and not 4>5

True

In [23]:
4 >= 3 or 4>5

True

#### Text Strings

In [24]:
"Hello"

'Hello'

In [25]:
type("Hello")

str

In [45]:
"Hello" + 'Data'

'HelloData'

In [27]:
"Number %d"%5

'Number 5'

In [28]:
"Number %s"%"Five"

'Number Five'

In [29]:
"Hello" == "Data"

False

In [30]:
"Hello" == "Data"

False

In [31]:
"Hello" != "Data"

True

In [32]:
not "Hello" != "Data"

False

#### Exercises

What do the following expressions return?

1. `"Data" == 1`

2. `1 == 1 and 2 == 1`

3. `1 == 1 or 2 != 1`

4. `True or 1 == 0`

5. `1 != 0 and 2 == 1`

6. `not (1 == 1 and 0 != 1)`

7. `not (1 != 10 or 3 == 4)`

8. `not ("Data" == "Data" and "Hello" == "Yellow")`

9. `1 == 1 and not ("Data" == 1 or 1 == 0)`

10. `"Data" == "Yellow" and not (3 == 4 or 3 == 3)`

### Variable Assignment

In programming, variables are names that refer to a location in computer memory. By separating the value from the name, a variable allows us to a reference to the same value independently.

In [40]:
height = 10
width = 2.5

We can also define multiple variables in a single line of code.

In [34]:
height, width = 10, 2.5

If we use the name of a variable that has not been declared before, we get an error message.

In [38]:
depth

NameError: name 'depth' is not defined

Operations can also be performed on variables, as long as the underlying type understands what to do with the operator.

In [36]:
height * width

25.0

In [44]:
area = height * width
area

25.0

In [64]:
text = 'Hello Data'
text

'Hello Data'

#### Exercises

1. Suppose a person weighs 80kg and is 1.80m of height. Compute the body mass index using weight ($w$) and height: ($h$) $$bmi=\frac{w}{h^2}$$

In [49]:
weight = 80
height = 1.80
bmi = weight/height**2
bmi

24.691358024691358

2. Create a string that formats the into "BMI=12.5"

In [54]:
"BMI %s"%bmi

'BMI 24.691358024691358'

3. Assign a variable names `threshold` that defines a threshold for the BMI at 25. Then check if the BMI is less then 25

In [61]:
threshold = 25
bmi < threshold

True

## Functions

Functions execute a block of code on different types of inputs and return one or more outputs. They are used to isolate a piece of code which avoids code repitition and allows testing. In this sense, they help to reduce redundancy and improve reliability.

One of the most basic and most used functions is `print`. Print accepts a string as input and then prints it to standard out. When used in Jupyter, `print` does not produce an output cell.

In [65]:
print(text)

Hello Data


Now let us create our own function which outputs "Hello Data". Functions are defined by the keyword `def` followed by the function name, two brackets, and a colon at the end. The function body must be indented.

In [66]:
def hello_data():
    print("Hello Data")

In [67]:
hello_data()

Hello Data


Note that `print` is also a function that accepts `str` as argument. Let us define our own print function that adds "Hello" in the beginning.

In [68]:
def hello_1(name):
    print("Hello %s"%name)

In [69]:
hello_1("Nils")

Hello Nils


If we call the function without its argument, we receive an error message.

In [70]:
hello_1()

TypeError: hello_1() missing 1 required positional argument: 'name'

A function can have multiple arguments which are separated by commas.

In [71]:
def hello_2(name1, name2):
    print("Hello %s"%name1)
    print("Hello %s"%name2)

In [74]:
hello_2("Nils","Sven")

Hello Nils
Hello Sven


All of the above function did not have a return value. I we want a return value, we must add a `return` statement in the last line of the function body.

In [95]:
def multiply(a, b):
    c = a*b
    return c

In [96]:
multiply(4,5)

20

Another feature of Python is that we provide function arguments that are optional and have a default value. For example, let use define a function that takes a value to the power of 2 by default and requires an additional argument otherwise.

In [97]:
def power(base, exponent=2):
    return base**exponent

In [98]:
power(10)

100

In [99]:
power(10, 3)

1000

In [100]:
power(10, exponent=3)

1000

#### Exercises
1. Create a function named `pow2` that accepts a number returns its to the power of two value. Test the function by passing the value of 5.

In [103]:
def pow2(number, exp=2):
    return number**exp
pow2(5)

25

2. Create a function that `bmi` that accepts weight and height as arguments and returns the bmi. Test the function by passing values `height=2` and `weight=50`.

In [105]:
def bmi(weight, height):
    return weight/height**2
bmi(height=2, weight=50)

12.5

3. What is the control flow of the following program:

```
def a():
    print("A")
def b():
    a()
    print("B")
    c()
def c():
    a()
    print("C")
b()
```

#### Built-in Python Functions

The function `type` returns the data type of a variable.

In [52]:
type(height)

int

In [53]:
type(area)

float

The function `sum` produces the sum of a collection.

In [646]:
sum([10,5,15,2,18,35,15])

100

The functions `max` returns the maximum of a sequence of numbers, `min` the minimum.

In [56]:
max(40,20,-20,90)

90

In [57]:
min(40,20,-20,90)

-20

The functions `abs` returns the absolute value of a number.

In [58]:
abs(-1.245)

1.245

The function `round` rounds a real number to the nearest integer.

In [59]:
round(1.245)

1

We can specify the number of didgits for rounding, too.

In [60]:
round(1.245, 2)

1.25

But how do we know what arguments a function provides? To get help on a function and its arguments, we can use the function `help`.

In [109]:
help(round)

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.
    
    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



## Collections

A collection is a grouping of several data elements, like primitives or variables, that are easier to process if treated together. For example, if we want to work with a large number of values, we do not want to decleare a separate variable for each value.

### Lists

A list is an ordered collection of elements with an index, also referred to as a sequence type. Elements that are contained in a list are enclosed in square brackets `[]` and separated by commas.

In [111]:
heights = [50,60,80,100,50,200]
heights

[50, 60, 80, 100, 50, 200]

We can access elements of the list by using an index. The first element of a list does not have the index 1 though but 0, which can be tricky if not used to. 

|index|0|1|2|3|4|5|
|-----|-|-|-|-|-|-|
|value|50|60|80|100|50|200|

To access the second element, we must use the index 1.

In [112]:
heights[1]

60

We can also change elements in the list, for example, we assign the third element of our list to a new value.

In [113]:
heights[2] = 70
heights

[50, 60, 70, 100, 50, 200]

The function `len`can be used to find out the number of elements in a list (or any collection).

In [114]:
len(heights)

6

If our index is equal to or larger than the number of elements in the list, we get an error message.

In [115]:
heights[6]

IndexError: list index out of range

Python also allows some indexing magic, for example, if we want to access the last element of the list, we can use `heights[-1]` which is short for `heights[len(heights)-1]`.

In [116]:
heights[-1]

200

Python is not very strict when it comes to the data types of the elements of a list. It may not be good practice to use elements of different types in one list, but it is possible.

In [225]:
teacher = ["Nils", 37, True, [1.86, 86]]
teacher

['Nils', 37, True, [1.86, 86]]

#### Slicing

Python makes it quite easy to obtain a sub-list from the list called a slice. The range of this sub-list is indicated by a `:`, whereby number indicate the upper and lower bound of the range. Note that the upper bound is exclusive, which means that the element with the index of the upper bound will not be in the slice.

In [119]:
heights[0:2]

[50, 60]

If we provide no value for the bound, zero or the length of the list will be used as index of the bound.

In [120]:
heights[:2]

[50, 60]

In [123]:
heights[2:]

[70, 100, 50, 200]

In [124]:
heights[:]

[50, 60, 70, 100, 50, 200]

In [125]:
heights[:-2]

[50, 60, 70, 100]

In [232]:
heights[-2:]

[50, 200]

As the slice only returns a view of the list, we can exchange its elements quite easily.

In [130]:
heights[:2] = [-10,-10]
heights

[-10, -10, 70, 100, 50, 200]

If we include another `colon` after the bounds, we can skip one or more elements in the list, for example, selecting only every second element.

In [131]:
heights[::2]

[-10, 70, 50]

In [134]:
heights[1::2]

[0, 0, 0]

#### Operations with Lists

In [235]:
widths = [2.5,3,5,7,10,5,18]

Unlike with primitive data types, not all operators are defined for a list. 

In [236]:
heights * widths

TypeError: can't multiply sequence by non-int of type 'list'

We also should not think of lists as vectors, which we can add together. The `+` operator, for example does not add together the individual elements, but rather glues both lists together. 

In [237]:
heights + widths

[30, 40, 70, 100, 50, 200, 2.5, 3, 5, 7, 10, 5, 18]

In [238]:
heights + [10]

[30, 40, 70, 100, 50, 200, 10]

We can also check if an element is contained in a list.

In [136]:
30 in heights

False

Or we can get a sorted version of the list.

In [140]:
sorted(heights)

[-10, 0, 0, 0, 50, 70]

#### Exercises

1. Create a new list named `new_list` that contains values 0, 10, 5, 20, 15, 25, 35, 30.

In [141]:
new_list = [0, 10, 5, 20, 15, 25, 35, 30]

2. Print the 1st element of the list.

In [142]:
new_list[0]

0

3. Print the 2nd element of the list.

In [143]:
new_list[1]

10

4. Print the last element of the list.

In [144]:
new_list[-1]

30

5. Print every third element of the list starting with the second element.

In [145]:
new_list[1::3]

[10, 15, 30]

6. Print a sorted view of the list.

In [146]:
sorted(new_list)

[0, 5, 10, 15, 20, 25, 30, 35]

7. Assign the last element the value of the first element.

In [147]:
new_list[-1] = new_list[0]

8. Multiply the first with the third element of the list.

In [148]:
new_list[0] * new_list[2]

0

#### Optional: Operations on Lists
All the above operations and functions executed operations with the list, but did not change the list itself.

In [243]:
heights.append(10)
heights

[30, 40, 70, 100, 50, 200, 10]

In [244]:
heights.insert(1,10)
heights

[30, 10, 40, 70, 100, 50, 200, 10]

In [245]:
heights.remove(10)
heights

[30, 40, 70, 100, 50, 200, 10]

The `remove` function compares each element of the list with the given argument and removes the first occurance. This is normally not what we want to do. To remove an element at a certain index, we can use `del`. Note that `del` can also be used to delete variables.

In [246]:
del heights[0]
heights

[40, 70, 100, 50, 200, 10]

In contrast to `sorted`, which returns a copy of the list, the function `sort` operates on the list itself. This introduces an important distinction between copies and views, which is subtle but important when code efficiency is critical.

In [651]:
heights.sort()
heights

[10, 40, 50, 70, 100, 200]

### Strings

Strings are special types of sequences that contain a individual characters. Strings are immutable, whereas lists are mutable, which means once created you cannot change the elements they contain anymore.

In [99]:
text = "Hello Data"

As a string is a sequence of characters, each character has a position in the string.

In [150]:
text[0]

'H'

Individual elements of a string cannot be assigned a new character value.

In [101]:
text[0] = "B"

TypeError: 'str' object does not support item assignment

We can slice a string, just like a list.

In [102]:
text[1:-1]

'ello Dat'

And we can concatenate strings by using the `+` operator.

In [103]:
"Hello"+" "+"Data"

'Hello Data'

A more elegant way of string concatenation is by using placeholders, where `%s` stands for string, and `%d` stands for an integer number.

In [662]:
"%s %d"%("Number",5)

'Number 5'

#### String and Numbers

Strings can be converted to numbers and vice versa.

In [104]:
answer_str = "42"
answer_str

'42'

In [105]:
type(answer_str)

str

In [106]:
answer_int = int(answer_str)
answer_int

42

In [107]:
type(answer_int)

int

In [108]:
str(answer_int)

'42'

Of course, only numbers can be converted into numeric data types.

In [109]:
int("not a number")

ValueError: invalid literal for int() with base 10: 'not a number'

#### Useful Functions for Strings

Due to their nature of containing nothing but text comprised of characters, there are a number of useful functions worth knowing about.

In [157]:
text.split()

['Hello', 'Data']

In [111]:
" ".join(["Hello","Data"])

'Hello Data'

In [112]:
text.endswith("a")

True

In [113]:
text.startswith("B")

False

In [114]:
text[0].isdigit()

False

In [153]:
text

'Hello Data'

In [154]:
"He" in text

True

Unlike with lists, functions that operate on strings do not change the underlying string, but always return a copy.

In [159]:
text2 = text.replace("H","B")
print(text2)
print(text)

Bello Data
Hello Data


#### Exercises

1. Create a variable named `river` that contains the string "Mississipi".

In [164]:
river = "Mississipi"

1. Replace the "s" in `river` with a "z"

In [165]:
river.replace("s","z")

'Mizzizzipi'

2. Count the number of occurences of "s" in `river`.

In [166]:
river.count("s")

4

3. Return the index of the letter "p" in `river`.

In [171]:
river.index("p")

8

4. Split the sentence "This is a catch 42 case." by white spaces and assign the resulting list of strings the variable name `split_sentence`.

In [172]:
sentence = "This is a catch 42 case."
split_sentence = sentence.split()
split_sentence

['This', 'is', 'a', 'catch', '42', 'case.']

5. Convert the numeric element of the list to an int.

In [173]:
int(split_sentence[4])

42

### Optional: Tuples

Tuples are like lists but immutable, which means once created you cannot change the elements they contain anymore. Hence, all operations that can be done with lists can also be done with tuples, but tuples do not allow operations on them, like `append`, `remove`, or `sort`.

In [249]:
heights_tuple = (50,60,80,100,50,200)

In [250]:
heights_tuple[1]

60

In [251]:
heights_tuple[2] = 70

TypeError: 'tuple' object does not support item assignment

In [252]:
heights_tuple + (5,10)

(50, 60, 80, 100, 50, 200, 5, 10)

Tuples that have only one element are singletons and must include a comma.

In [253]:
heights_tuple + (5,)

(50, 60, 80, 100, 50, 200, 5)

In [254]:
sorted(heights_tuple)

[50, 50, 60, 80, 100, 200]

In [255]:
heights_tuple.sort()

AttributeError: 'tuple' object has no attribute 'sort'

### Optional: Sets

Another useful collection is the set, which is an unordered collection of elements with no duplicate elements. A set is defined by curly braces `{}` and elements are separated by commas.

In [117]:
heights_set = {50,60,80,100,50,200}
heights_set

{50, 60, 80, 100, 200}

Other than lists, sets do not have an index.

In [118]:
heights_set[1]

TypeError: 'set' object does not support indexing

Trying to insert a duplicate will not change the set.

In [119]:
heights_set.add(50)
heights_set

{50, 60, 80, 100, 200}

If we insert a new element it will receive an arbitrary position in the set.

In [120]:
heights_set.add(70)
heights_set

{50, 60, 70, 80, 100, 200}

Sets are more efficient than lists if we want to check whether an element is contained in the set.

In [121]:
30 in heights_set

False

### Dictionaries

While lists accept that elements have different types, this usage is rather uncommon. Instead of using a list, you can use a dictionary for that purpose, which accepts text strings as index. A dictionary is defined by curly braces `{}` and key-value pairs are separated by commas, whereby keys and values are spearated by a colon `:`. Note that the keys in a dictionary are sets, hence there cannot be duplicated.

In [174]:
teacher = {"name": "Nils", "age": 37, "married": True, "bmi": [1.86, 86]}
teacher

{'name': 'Nils', 'age': 37, 'married': True, 'bmi': [1.86, 86]}

Dictionaries do not have a numeric index, and thus also do not accepts a number as index.

In [175]:
teacher[0]

KeyError: 0

Instead we must use a key as index.

In [235]:
teacher["name"]

'Nils'

To see which keys are in a dictionary, we can use the function `keys`.

In [177]:
teacher.keys()

dict_keys(['name', 'age', 'married', 'bmi'])

## Comments

Comments are used to document code. A well-written program contains at least as much commenting text as code. It also helps to give variables and functions names that define its purpose. Instead of defining a function that counts numbers `def c(d):`, we could call it `def count(numbers):`. For everything else, we can use comments.

An in-line comment starts with a `#`. The interpreter will ignore everything that follows it.

In [180]:
def count(elements):
    print(len(elements)) #print the length of the list

We can also use multi-line comments if we need more space.

In [188]:
def count(elements):
    """The count function prints the number of elements in a given collection.

    Args:
        elements: an arbitrary collection of elements
    """
    print(len(elements))

## Flow Control

### `if` Statements

The `if` statement is defined by the keyword `if` followed by a boolean operation and a colon. As with functions, the body of the statement must be indented. Any code inside the body will only be executed if the condition behind the `if` returns `true`. Otherwise, either nothing happens, or code inside the `else` block will be executed.

In [192]:
x = 0
if x < 0:
    print("negative")
else:
    print("positive")

positive


Although zero is a positive number by definition, we may want to distinguish this case. We can do this by using the `elif` keyword which can be read as "else if". We can use `elif` if there exists multiple cases.

In [201]:
x = 1000
if x < 0:
    print("negative")
elif x == 0:
    print("zero")
else:
    print("positive")

positive


`If` statements can also be used inline, for example to create conditional assignments. While this creates code that it is less verbose, it may also decrease readibility and hence maintainability.

In [130]:
sign = "negative" if x<0 else "positive"
sign

'positive'

#### Exercises

1. Create a sign function `sign1` that receives a number of input and returns 1 if the number is positive and -1 if it is negative. Test the function by passing it values -5, 0, 10.

In [205]:
def sign1(x):
    if x<0:
        return -1
    else:
        return 1
sign1(-5), sign1(0), sign1(10)

(-1, 1, 1)

2. Now create a sign function `sign2` that requires only one line of code. Test the function by passing it values -5, 0, 10.

In [206]:
def sign2(x):
    return -1 if x<0 else 1
sign2(-5), sign2(0), sign2(10)

(-1, 1, 1)

### `for` Statements

The `for` statement iterates over the elements of a given collection, like list, tuple, set, etc. The `for` statement is defined as `for <NAME> in <COLLECTION>:`. The code block after the colon gets executed for each element in the collection with the element being assigned the variable name `<NAME>`.

In [215]:
colors = ["red", "green", "blue", "yellow", "orange", "purple"]
for color in colors:
    print(color)

red
green
blue
yellow
orange
purple


Of course, we can also-use one of the built-in iterators. 

In [209]:
for color in sorted(colors):
    print(color)

blue
green
orange
purple
red
yellow


If we want to iterate over a sequence of numbers, for example to access elements of a list, we can use the `range` function.

In [212]:
index = 0
print(index)
index = 1
print(index)
index = 2
print(index)

0
1
2


In [218]:
for index in range(len(colors)):
    print(colors[index])

red
green
blue
yellow
orange
purple


In [134]:
for i in range(1,6,3):
    print(colors[i])

green
orange


If we happen to need the index as well as the value, we can use `enumerate`.

In [135]:
for i, color in enumerate(colors):
    print(i, color)

0 red
1 green
2 blue
3 yellow
4 orange
5 purple


If we iterate a dictionary, we always get the keys. We can use the keys to access the values of the dictionary.

In [221]:
colors = {"red":  (255,0,0),
          "green":(0,255,0),
          "blue": (0,0,255)}
for key in colors:
    print(key, colors[key])

red (255, 0, 0)
green (0, 255, 0)
blue (0, 0, 255)


Another nice syntactic sugar is the in-line generation of lists using the `for` statement, also known as list comprehension.

In [243]:
[i for i in range(5) if i>2]

[3, 4]

#### Exercises

1. Create a list of integers names `numbers` with values 0, 50, 60, 5, 40. Then iterate the list and print all numbers that are greater than 10.

In [246]:
numbers = [0, 50, 60, 5, 40]
for number in numbers:
    if number>10:
        print(number)

50
60
40


2. Create a variable `total` and assign it to 0. Then iterate the list and assign sum its own value plus the current element of the list of numbers. Then print out the value of `total`.

In [247]:
total = 0
for i in range(len(numbers)):
    total = total+numbers[i]
    print(total)

0
50
110
115
155


3. Create a variable `count` and assign it to zero. Increase the value of count by one if the current iterate is greater than zero. Then print out the value of `count`.

In [251]:
count = 0
for i in range(len(numbers)):
    if numbers[i]>0:
        count = count+1
print(count)

4


## Packages

Never write code twice. This also holds for code that others have written and published. Unless you have a good reason not to use existing code, do not rewrite a program that already exists. A good way to tap into the work of others is by using a library, referred to as packages in Python.

To import a package and make use of its functions, simply write `import` and the package name. The package name will now be part of the namespace. Let us import the math package.

In [252]:
import math

The math package gives you acces to a number of functions that were unavailable so far, like `sqrt`, `sin`, `exp`, etc. To access a member of the function you must use `<PACKAGE NAME>.<FUNCTION NAME>`.

In [253]:
math.sqrt(16)

4.0

If this seems to cumbersome, you can also give the package a shorter name by using `import <PACKAGE NAME> as <SHORT NAME>`.

In [None]:
import math as m

In [665]:
m.sqrt(16)

4.0

Or if it is clear that you will only use the `sqrt`function of the math package, you can also import a single function or module by using `from <PACKAGE NAME> import <FUNCTION NAME>`.

In [None]:
from math import sqrt

In [666]:
sqrt(16)

4.0

Math does not only have functions, but also constants, like $\pi$.

In [601]:
math.pi

3.141592653589793

#### Exercises

1. Import the `random` module.

In [288]:
import random

2. Get help on the function `shuffle` from the package `random`

In [289]:
help(random.shuffle)

Help on method shuffle in module random:

shuffle(x, random=None) method of random.Random instance
    Shuffle list x in place, and return None.
    
    Optional argument random is a 0-argument function returning a
    random float in [0.0, 1.0); if it is the default None, the
    standard random.random will be used.



3. Create a list named `ordered` with values 10, 20, 30, 40, 50, and pass the list to the `shuffle` function. Then print the values of the list.

In [287]:
ordered = [10,20,30,40,50]
random.shuffle(ordered)
ordered

[40, 20, 30, 50, 10]

###  Package Management with `pip`

It may happen that a package is not available or has not yet been installed. In this case we can use the package manager `pip` to fetch the package from the Python package repository PyPI. A package that we will need soon is `numpy`. Although `numpy` ships with some installations, like Anaconda, it is not included in Python per se. To install a new package from PyPI on our system, we simply call `!pip install <PACKAGE NAME>` (or `pip3 install <PACKAGE NAME>` if Python 2.7 and 3.x are both installed on the system). Note that the `!` in the beginning tells Jupyter that this is not a Python but a shell command, which we can also use to copy, fetch, and unpack files if needed.

In [620]:
!pip3 install numpy



#### Exercises

1. Install the library matplotlib.

In [296]:
!pip3 install matplotlib

