# Software Carpentry @EMBL 2017

## Day 0: Introduction to Python

1. [Introduction](#Introduction)
2. [Variables](#Variables)
3. [Data Structures](#Data-Structures)
4. [Loops](#Loops)
5. [Conditionals](#Conditionals)

### Introduction

This material provides an introduction to the Python Programming language. It is intended to provide the learner with a basic understanding of Python: the syntax; the important concepts; the fundamental data structures. This lesson will the provide the foundation for the 'core' Python programming sessions on Days 2 & 3.

If you are attending Day 0 of the course, there is no need to work through this notebook on your own - we will cover the material during the optional session. However, if you aren't attending the optional half-day, please look through this material and make sure that you are familiar with the concepts introduced here.

#### Getting Python

If the computer you're working has a Mac OSX or Linux operating system, then Python is already installed. If not, or if you would like to make sure that you have a more up-to-date version to work with, you will need to install Python. We recommend installing the __Anaconda distribution__, which you can download from here: https://www.anaconda.com/download/ . All our course material is written for version 3.6, but we provide notes where things are different for version 2.7 (more about Python versions during the course!). <font color='firebrick'>_Note: clicking on the download links on that webpage will cause a pop-up to open, asking for your email address - just click "No thanks" if you'd prefer not to give your details: you can download the installer anyway._</font>

There are several advantages of using Anaconda, even if you have a native installation already available:

- you get a more up-to-date version of the language
- it comes with many of the modules commonly required for performing data analysis, statistics, visualisation etc already installed. (More about using these modules later.)
- it includes a package manager, called `conda`, which makes it very easy to install additional modules
- it includes a decent Python development environment, called Spyder, and Jupyter, which you will need to work most effectively with these course materials

#### Using these materials

This document is a _Jupyter Notebook_: a web browser-based format in which it's possible to mix sections of text (such as this one) with cells of code, images, etc. The file can be displayed in a fixed, 'static', form or, using a _Jupyter server_ it can be rendered interactively, allowing the user to edit and execute new and existing code blocks, and make additional notes in text blocks etc. A brief overview of your options when working with this document:

__1. Work in the Python shell.__ This is probably the simplest option. Once you have Python installed (see above), open a terminal and type:

```bash
python
```

You should see the prompt change to __`>>>`__, which indicates that you are now in the Python _shell_. At this prompt you can type or copy/paste any of the code lines/blocks written below, and press 'enter' to execute them. To leave this shell environment, type:

```python
exit()
```

__2. Work in a integrated development environment (IDE) e.g. Spyder.__ An integrated development environment provides a richer setting for experimenting with the langauge than can be provided by a shell or a simple text editor. Spyder s typical of many IDEs: it provides a text editor window, which can be used to write multiple-line _scripts_ of Python commands and a shell window to try out individual lines/display the output of executing the scripts. It also provides a third pane that can be used to inspect objects in your environment, help pages, figures created by your code etc.

__3. Work with the interactive notebook.__ To run this notebook interactively, you need a Jupyter server. If you installed Anaconda, you have Jupyter installed already. Download this notebook, open a terminal and type:

```bash
jupyter notebook
```

After a few moments, a new web browser window/tab should open, displaying a menu that can be used like a file browser - click on the links to open the corresponding file or folder on your filesystem. Through this interface, navigate to wherever you downloaded the notebook file to, and click on the notebook filename. A new tab should open, with the notebook rendered in it.

To work with the notebook, click in one of the code cells (these are the cells with `In [ ]:` next to them), then press 'ctrl+enter' to execute that cell. You can also use 'alt+enter' to execute the cell and open a new, blank, code cell underneath. You can go back and edit cells before re-executing them. Be aware of a few things:

- the execution order of the cells is important and sometimes not immediately obvious. "Normal" Python code is interpreted from top to bottom i.e. the values associated with things at the bottom are determined by whatever was written above. In a Jupyter notebook, cells can be executed in whatever order the user likes. this means that the lower of two cells, if executed before the higher, can influence the output of the higher. Pay attention to the numbers in the `[ ]` next to the cells - this tells you the order in which cells were executed.
- you might want to create a new cell for your own experimentation, instead of editing/overwriting the pre-exisiting example cells. This will help you to avoid getting stuck after having deleted a working example. If you do get in a muddle, you have a couple of options: pressing 'ctrl+z' will undo the most recent changes in a cell, just like you're probably used to; or you can copy/paste the cells from this notebook if you have it open in a browser.
- you can change the type of the cell you're working on using the dropdown menu at the top of the page - text cells are refered to as 'Markdown'.

A more comprehensive introduction to Jupyter will be provided during the course.

#### The Language

Python is a high-level programming language, which is highly versatile, relatively accessible, and extremely well-supported. If you've never programmed before, Python is a great language to start with. A couple of the big reasons for this are: by the standards of programming languages, it is easy for humans to read; and, instead of spending a lot of time worrying about defining rigid types and arranging things in multiple files, it is quick to get started with _actually doing things_ with Python.

### Variables

The basic building blocks of Python are variables - small packages of data, which can be processed, operated on, and returned as the desired output of a program.

One simple datatype is an integer value:

#### Integers

In [None]:
100

These integer values can be combined in mathematical operations, treating Python as a simple calculator:

In [None]:
# addition
1378 + 6670

In [None]:
# division
81 / 9

In [None]:
# exponention
13**6

_Note: the lines beginning with `#` above are comments, and are ignored by the Python interpreter. You can use comments after some code on the same line too - everything after the `#` will be ignored. You should use comments often to annotate your code. It will help you and others to understand your programs after you've written them._

Values can be assigned to a variable name, which allows them to be refered to and reused later on.

In [None]:
a = 3
b = 7
c = 1

a * (b + c) # equivalent to 3 x (7 + 1)

You should notice that the assignment of a name to a value is done with the `=` symbol. It's also very important to note that each line doesn't end with some character, like `;`, as is the case in many other languages. Instead, the line breaks are interpreted as the ending of each statement. This is the first example of 'empty' characters, known as _whitespace_, being important in the syntax of Python. We'll see other examples of this later, as well as some cases where line returns can be ignored in the middle of a statement.

#### Floating Point Numbers

As well as integers, we can also use numeric values with decimal places of precision. These are referred to as _floating point numbers_ or 'floats'.

In [None]:
d = 0.25
e = 3.14
f = .33

d + e - f

We can also operate on a mix of the two different types.

In [None]:
a * f

Despite the capability to mix these two types together like that, it's important to know that you can get different behavious depending on which type you use. For example, depending on which version of Python you're using, you might find that you get unexpected results when dividing integers:

In [None]:
%%script python2

g = 10
h = 4
print(g/h)

In [None]:
%%script python3

g = 10
h = 4
print(g/h)

_Note: the `print` function diplays (prints out) the value of whatever variables it is given in the `()` parentheses._

#### Object Types

These integers and floats are treated differently because they are different _types_ of variable. In other languages, the type of a variable must be formally decalred when it is created but in Python the type of a variable is determined by the value that it has been assigned. Two things are important for you to know at this point:

1. You can check the type of a variable at any time with the `type` function
2. At any time the value of a variable can be replaced, such that a variable of the same name can have several different types within the same program

In [None]:
type(a)

In [None]:
type(f)

In [None]:
f = 100

In [None]:
type(f)

This is important, because it means that it's up to you as the developer to avoid accidentally overwriting the value of one variable with another. Python won't warn you when this happens, so you might only realise that something has gone wrong much later on.

#### Strings

It's time to introduce a third type of data, strings:

In [None]:
my_string = "hello world"
type(my_string)

A string is a sequence of characters: in the example above, some letters and a space. The string value must be enclosed in quotation marks. You can use single `''` or double `""` quotation marks: they don't have different meanings in Python - just make sure that you don't mix the two together!

You can perform some of the same operations on strings that you can on numeric data, such as addition. Python won't let you combine the two, though:

In [None]:
second_string = "hallo welt"

my_string + second_string

In [None]:
my_string + 14

#### Error Messages

Executing the cell above caused an error, which results in an error message being output. this eeror message is reproduced below:

```
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-57-7b2c260e57d6> in <module>()
----> 1 my_string + 14

TypeError: must be str, not int
```

Let's take a moment to understand what this message is telling us. At the top of the message, we get a heading that tells us what kind of error has arisen - a `TypeError`. This means that the error is in some way related to the _type_ of the objects being operated on. 

Next, we have what is referred to as the _Traceback_ - a tracking of the error through the stack of operations that were being attempted. _(In this case, the traceback only contains one block, but in more complicated programs the traceback can be much longer, as the error is traced through all of the lines of code in many different files/locations that are involved in a particular operation. In general, if you do get a long traceback you can ignore the middle blocks and only look at the first one and the last one.)_ In the traceback itself, we see a weird name for the code cell above - `<ipython-input-57-7b2c260e57d6> in <module>()`. This can be ignored.

Now the most helpful part, an arrow `---->` pointing to the line where the error occurred. Often there will also be a second pointer `^` on the line below to provide further guidance as to where (Python thinks) the error can be found.

Last but not least, we get the error message text itself: `TypeError: must be str, not int`. This tells us more about the problem encountered. In this case, the issue is that when we try to use addition on a string, Python expects another string to be provided, and we gave it an integer. Some other languages will do strange things in these circumstances, such as converting the `14` to a string and returning `hello world14`, but Python simply won't allow it and throws an error.

#### Type conversions

If `hello world14` was actually what you wanted from the operation above, you need to first convert the integer value `14` to a string. This is done using the `str` function:

In [None]:
my_string + str(14)

#### Attributes & Methods

In Python everything is an _object_. This means that these pieces of data - an integer, a float, a string of characters - carry with them additional information about the data (referred to as _attributes_) and instructions for common operations (called _methods_) that you might want to perform with data of its particular type.

Objects of different types carry different attributes and methods. To see the available options for any given variable, you can use the `dir` function:

In [None]:
type(my_string)

In [None]:
dir(my_string) # attributes and methods for a string object

In [None]:
type(a)

In [None]:
dir(a) # attributes and methods for a string object

A lot of the things in those lists look a bit strange. The names flanked by double-underscores e.g. `__reduce__` are intended for internal use for the object only; it's the names at the end of the lists, which don't have flanking underscores, that you should take note of.

In the case of the string object, `my_string`, available methods include `upper`, `endswith`, and `split`, which can respectively be used to convert letters to upper case, check whether the string ends with a specified substring, and split the string into substrings. These are all commonly-required string operations. The integer object, `a`, carries attributes common to numeric data types, such as `numerator` and `denominator`, and methods to convert the integer to/from raw bytes.

To access an attribute of an object, specify the name of the object followed by a dot `.` and the name of the attribute:

In [None]:
a.numerator

Use the same syntax to invoke a method of an object, but with a set of `()` parentheses at the end:

In [None]:
my_string.upper()

These parentheses are important: they are how information and other items required to control the action of the method are provided. Such additional pieces of data for the method are called _arguments_.

#### Getting Help

For an example of using arguments, let's use the `count` method of our string object. First, we need to see how to use the method. We can use the `help` function to get more information about usage of any object, function, or method.

In [None]:
help(my_string.count)

The output of that function call is reproduced below:

```
Help on built-in function count:

count(...) method of builtins.str instance
    S.count(sub[, start[, end]]) -> int
    
    Return the number of non-overlapping occurrences of substring sub in
    string S[start:end].  Optional arguments start and end are
    interpreted as in slice notation.
```

The help message starts with a definition of what was passed to it: `count(...) method of builtins.str instance` (in other words, the `count` method of a string object). Next, we get an example of how to use the method (the `usage statement`):

`S.count(sub[, start[, end]]) -> int`

This tells us that the method takes at least one argument (`sub`), but can take two or three. We can see that `start` and `end` are optional arguments, because they are enclosed in `[]` square brackets. The usage statement also tells us that the method call will return an integer value (`-> int`).

The third part of the output from `help` is a short description of the method and its usage. We will cover what is meant by "slice notation" soon.

Now that we know how to use count, let's try invoking it:

In [None]:
my_string.count("o")

In [None]:
my_string.count("ll")

In [None]:
my_string.count("H")

Note that you must provide the letter/string to be counted enclosed in quotation marks `""` or `''`. Also note that the counting is case sensitive.

You can capture the output of these operations in another variable:

In [None]:
shouting = my_string.upper()

In [None]:
print(shouting)

So far, so good. But, in order to really start doing some of the things that programming is most useful for, we need to start working with larger collections of data.

### Data Structures

#### Lists

One of the fundamental collections of data in Python is the _list_. A list is an ordered sequence of values. The values can be of any type in any combination. Below is an example of a list of strings, which is created by wrapping the comma-separated values in `[]` square brackets:

In [None]:
shopping = ['bread', 'potatoes', 'eggs', 'flour', 'rubber duck', 'pizza', 'milk']

Because lists are ordered, we can access the entries based on their position in the list - their _index_. This is where we have to introduce one of the more confusing things about Python: indexing begins at 0. So, to access the first entry in the example list, we use the following syntax:

In [None]:
shopping[0]

We can use this indexing to change the values at the different positions of the list:

In [None]:
shopping[6] = "sausages"
shopping

To add a new entry to the end of a list, use the `append` method:

In [None]:
shopping.append("basil")
shopping

To add a new entry in another position, without overwriting the value already at that index, use the `insert` method:

In [None]:
shopping.insert(3, 'spaghetti')
shopping

When you `insert` a new value, all of the entries to the right of that position are shifted up by one to make room.

You can further control the order of lists using the `sort` and `reverse` methods:

In [None]:
shopping.reverse()
shopping

In [None]:
shopping.sort()
shopping

#### Tuples and Mutability

Another type of data in Python is the _tuple_. Below is an example:

In [None]:
fibonacci = (1, 1, 2, 3, 5, 8, 13, 21)

Tuples are similar to lists in a few ways: they are ordered sequences of values, and these values be accessed by index. They are defined in a similar way, too: only using `()` parentheses instead of `[]` square brackets. However, there is a key difference - tuples are _immutable_.

An immutable object is an object whose value cannot be changed in place after it has been set. We've already seen that lists are not immutable (they are _mutable_), because we were able to change the order of our `shopping` list, add new entries etc. Once a tuple has it's entries, those entries cannot be rearranged or altered:

In [None]:
fibonacci[2] = 49

This is a subtle difference, but quite an important one. Tuples should be used whenever you want to be sure that a collection of information only makes sense together and in the order specified, such as with our Fibonacci sequence above.

Other immutable data types include strings, integers, and floats.

#### Dictionaries

The last data structure that it is important to introduce at this stage is the _dictionary_. This is a good data structure to use whenever you have paired data, such as language courses and their numbers of students in the example below:

In [None]:
studentNumbers = { 'Hungarian': 16, 
                   'Hindi': 12,
                   'Portuguese': 20,
                   'Finnish': 9,
                   'English': 9 }

_Note: the dictionary above could have been defined on a single line, as_

```Python
studentNumbers = {'Hungarian': 16, 'Hindi': 12, 'Portuguese': 20, 'Finnish': 3, 'English': 9}
```

_but the Python interpreter also understands the construction when it is split over multiple lines, and I think that the multi-line approach is a bit easier to read._

Each dictionary entry is a pair: a _key_, in this case the language; and a _value_, the integer, associated with that key. To access the value associated with a particular key, use 

`dictionary[key]`

So, to access the number of students in the Finnish class in our the `studentNumbers` example dictionary, we would type:

In [None]:
studentNumbers['Finnish']

A dictionary can't contain duplicate keys, but can contain duplicate values i.e. `studentNumbers` could contain multiple courses with 8 students each, but only one entry for Polish, one for Italian, and so on.

To change the value associated with a particular key, we use the same syntax as above, combined with the `=` assignment operator:

In [None]:
studentNumbers['Hungarian'] = 8
print(studentNumbers)

Similarly, to create a new entry use the new key in the `[]` square brackets:

In [None]:
studentNumbers['Dutch'] = 14

_Note: in recent versions of Python, dictionaries remember the order in which their entries were created. Dictionaries had no inherent, fixed order until relatively recently, and this should not be relied upon._

The keys to a dictionary can be any immutable data type (strings, integers, ...) and the values can be of any type at all, and need not be consistent for all entries.