# Planning and Conducting a Psychology Experiment (Session 5, Nov. 26 2020): Introduction to Programming in Python (I)

This is a notebook with practical examples and exercises for the hands-on part on programming in Python of the seminar **"Planning and Conducting a Psychology Experiment"** co-taught by [Tomás Goucha](https://www.cbs.mpg.de/mitarbeiter/goucha), [Matteo Maran](https://www.cbs.mpg.de/employees/59327), [Giorgio Papitto](https://www.cbs.mpg.de/person/papitto/373360), and [Patrick C. Trettenbrein](https://trettenbrein.biolinguistics.eu) at [University of Leipzig](https://www.uni-leipzig.de/en/) in the winter term 2020-21.

The goal of our two hands-on sessions is to introduce participants to the basics of programming in Python. Our focus here is on understanding fundamental concepts in order to be able to use Python for scientific purposes such as stimulus presentation or data analysis.

In this first hands-on session we'll cover the very basics of programming (in Python):
- What is Python? And why use it?
- Ways of installing Python
- How does the Python environment work (packages, etc.)?
- What are names, expressions, statements?
- Data formats: Strings, lists, and dictionaries vs. numerical data
- Functions, conditions and loops

## What is Python? And Why Use it?

[Python](https://www.python.org) is:
- easy to learn
- a general-purpose programming language (as opposed to, e.g., R)
- available for all major platforms (Windows, Linux, macOS)
- open source
- large and very active development community
- very flexible due to large of default libraries and additional packages

### Popularity

Python is increasingly popular (data from [PYPL](https://pypl.github.io/)):

<img src="images/pypl_plot.png" width="75%"/>

And the popularity of Python continues to grow:

<img src="images/pypl_table.png" width="40%"/>

### Usage Example: *PsychoPy*

Especially amongst psychologists and neuroscientists Python is now frequently used for stimulus presentation and data analysis:

<a href="https://www.psychopy.org" target="_blank"><img src="images/PsychoPy_logo.png" width="40%"/></a>

*PsychoPy* comes with a GUI, yet can also be used without it to create simple experiments. More complex experimental designs, however, will require at least a bit of coding in Python.

<img src="images/PsychoPy_builder.png" width="80%" />

We will be focusing on creating experiments in *PsychoPy* in the upcoming Sessions 7 and 8.

### Usage Example: *Nipype*

Neuroimaging data can be analysed in many different ways using a wide variety of different software packages. The *Nipype* project provides a uniform Python interface to existing neuroimaging software and facilitates interaction between these packages within a single workflow.

<a href="https://nipype.readthedocs.io/en/latest/" target="_blank"><img src="images/nipype_logo.png" width="40%"/></a>

That is, *Nipype* makes it easy to interact with tools from different software packages and combine processing steps from these different packages into a single workflow:

<img src="images/nipype_architecture.png" width="40%" />

We will not be covering *Nipype* any further in this class, it just serves as an example to illustrate what Python can and is being used for.

## Installing Python

There is many different ways of installing Python. One way is to obtain the official packages for your respective operating system from the [web site of the Python project](https://www.python.org).

If you are using Linux or macOS, chances are that you'll already have a version of Python installed:

<img src="images/Python_on_macOS.png" width="70%" />

This screenshot shows us that macOS comes with a version of Python 2 already installed. However, in a scientific context we will almost always want to use a more currrent version and be able to update easily and as needed.

### *Anaconda*

One popular way of using Python in a scientific context is to use a ready-made Python environment such as, for example, [*Anaconda*](https://www.anaconda.com/). *Anaconda* is a collection of different components with a current version of Python at its core. Another advantage is that *Anaconda* can be installed as a regular user (i.e. doesn't require root access).

<img src="images/Anaconda.png" width="90%" />

An installation of *Anaconda* already includes an integrated development envornment such as, for example, [*Spider*](https://www.spyder-ide.org). Anoter IDE that has become very popular in recent years is [PyCharm](https://www.jetbrains.com/pycharm/).

<img src="images/Spyder.png" width="90%" />

We will not discuss IDEs any further here. Instead, what we'll be using--respectively already are using right now-- is [Jupyter Notebook](https://jupyter.org) to create interactive notebooks for coding in Python. If you are familar with R, think [RMarkdown](). That is, *Jupyter* notebooks can be used to code in an interactive and reproducible manner. The following screenshot shows the editing of this very notebook:

<img src="images/Jupyter_notebook.png" width="90%" />

Usually, you would run *Jupyter* locally (e.g., from *Anaconda*) to create notebooks in your browser. However, it is also possible to run notebooks on a service such as [*binder*](https://mybinder.org).

### What Does Python Code Look Like?

Below you will find some examples of Python code to give you a first idea. Notice that in this *Jupyter* notebook lines of code are prefixed with `In [n]:`, whereas `n` here represents the line number (i.e. the number of this particular line of code in context of the entire document you are currently viewing). The output of our commands (if any) is printed underneath and sometimes prefixed `Out[n]`.

If you are running this notebook locally, you can actually execute the code examples by pressing **Shift ⇧ + Enter ⏎**. Similarly, if you have accessed this notebook using the link to *binder* that was previded prior to this session, you should be able to run the code in your browser without having to install anything on your system:

In [10]:
print("Hello world!")

Hello world!


In [11]:
"i cannot find capslock! help, what do i do now?".upper()

'I CANNOT FIND CAPSLOCK! HELP, WHAT DO I DO NOW?'

In [12]:
for n in range(10, 101, 10):
    print(n, end=", ")

10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 

In [13]:
print(", ".join([str(n) for n in range(10, 101, 10)]))

10, 20, 30, 40, 50, 60, 70, 80, 90, 100


## How does the Python environment work?

If we were not using a *Jupyter* notebook for writing and running our code, we would most likely be wiriting a script in an IDE such as Spyder or PyCharm. In some cases, scripts may get rather long and we may want to split it into different files. Also, we may have implemented some functionality that we want to reuse in other scripts. For this purpose, we can put such defnitions into a file and import them in a script where we want to use them. Such a file is called a **module**. By default, a Python installation will already contain a large number of modules for different purposes that we can use.

Consider, for example, that we want to use Python to perform some mathematical calculations. Generally, we can just use Python like a calculator:

In [15]:
(10 - 9) / 2 - 1 * 1

-0.5

This shows us that Python knows how to correctly handle `()`, `*`, `/`, `+`, and `-` in a mathematical context. The output of the above calculation is `-0.5`: **Hence, notice that Python uses the English decimal separator `.`, as opposed to the German `,`!**

Now, if we want to perform a somewhat more complex calculation, we may want to rely on already defined functions that are available as part of the `math` module. We import this using:

In [17]:
import math

This makes a number of mathematical functions availabe which we can now access using the prefix `math.` For example: `log()` and  `sin()`, or constants such as `pi` and `e` can be used as follows:

In [22]:
math.pi

3.141592653589793

We can also use the `math` module in calculations:

In [27]:
math.log(( 10 * math.pi ))

3.447314978843446

In addition to the standard modules that ship with Python, a myriad of additional modules for all purposes is available via packages. An overview of all available packages can be found on the [Python Package Index](https://pypi.org). Additional packages can be installed in the terminal using the commands `pip search name_of_package` to first check if a package with a particular name exists, followed by `pip install name_of_package` to actually install a particular package.

If you are using Anaconda, you can also use the [`conda` command](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) in the command line of your system to install additional packages. Alternatively, you can also use the Anaconda GUI which you already saw above. Notice, however, that the packages available via `conda` may be less current than what is available via `pip`.

## What are names, expressions, statements?

Many students of psychology may already be familiar with scripting languages that are used for a specific purpose such as, for example, R. Python is slightly different in some respects and, in essence, follows a simple logic: Everything in Python is and **object**; and objects can have **names**.

Unlike many other languages, Python uses names instead of variables. That is, the `=` operator is used to assign a name:

In [32]:
name = 1 + 1

Now, the **expression**, `1 + 1` has been assigned the name `name`. Notice that different conventions for naming in Python exist. These state that `name` is actually not a **good** name for our expression. Why? It is not clear what it really contains. Hence, you should always try to use names that are desripctive, precise, yet not too long. For example, we could have used `calc_result` as a name instead (multiple words in names are usually separated using `_`in Python).

Also, there is a number of pre-defined names which we should not use, otherwise we will run into trouble. We can see those names by looking at the keyword module:

In [33]:
import keyword
keyword.kwlist

['False',
 'None',
 'True',
 'and',
 'as',
 'assert',
 'async',
 'await',
 'break',
 'class',
 'continue',
 'def',
 'del',
 'elif',
 'else',
 'except',
 'finally',
 'for',
 'from',
 'global',
 'if',
 'import',
 'in',
 'is',
 'lambda',
 'nonlocal',
 'not',
 'or',
 'pass',
 'raise',
 'return',
 'try',
 'while',
 'with',
 'yield']

Similarly, we should avoid overwriting so-called **builtins**, that is default functions included in Python. If we were to do that we would sooner or later run into serious trouble.

A list of these builtins can be viewed as follows:

In [34]:
dir(__builtins__)

['ArithmeticError',
 'AssertionError',
 'AttributeError',
 'BaseException',
 'BlockingIOError',
 'BrokenPipeError',
 'BufferError',
 'ChildProcessError',
 'ConnectionAbortedError',
 'ConnectionError',
 'ConnectionRefusedError',
 'ConnectionResetError',
 'EOFError',
 'Ellipsis',
 'EnvironmentError',
 'Exception',
 'False',
 'FileExistsError',
 'FileNotFoundError',
 'FloatingPointError',
 'GeneratorExit',
 'IOError',
 'ImportError',
 'IndentationError',
 'IndexError',
 'InterruptedError',
 'IsADirectoryError',
 'KeyError',
 'KeyboardInterrupt',
 'LookupError',
 'MemoryError',
 'ModuleNotFoundError',
 'NameError',
 'None',
 'NotADirectoryError',
 'NotImplemented',
 'NotImplementedError',
 'OSError',
 'OverflowError',
 'PermissionError',
 'ProcessLookupError',
 'RecursionError',
 'ReferenceError',
 'RuntimeError',
 'StopAsyncIteration',
 'StopIteration',
 'SyntaxError',
 'SystemError',
 'SystemExit',
 'TabError',
 'TimeoutError',
 'True',
 'TypeError',
 'UnboundLocalError',
 'UnicodeDecode

How is naming now different from assigning a variabel? In short, a variable in the traditional sense would, for example, reserve a certain amount of memory for storing a value. However, in Python everything is an object and names are actually only pointers to objects. We see this when we first create a list using the `[]` operator with the values `1`, `2`, and `3` and we give it the name `a`. Then we assign `a` to `b`:

In [1]:
a = [1, 2, 3]
b = a

Let's see what happens when we change the third value in the list `b` to `4`. **Important: Notice that Python, unlike R, starts indexing at `0`, not at `1`!** Hence, in order to access the third element of a list, we have to use the the index `[2]` and assign the new value using the `=` operator. After this, we tell Python to print the current value of `a`:

In [2]:
b[2] = 4
a

[1, 2, 4]

Coming from R or other languages, we would maybe expect that by using the `=` operator in the field above, we had created  a copy of `a` called `b`. When we modified an element of `b`, the values in `a` should not have been affected. Yet this is exactly what has happened here in Python. (You can actually try an see what happens in R using the exact same code. Just be aware that not modifying the index will lead to manipulateing the second element of the list in R [i.e. `2`], whereas here in Python it changed the value of the third element [i.e. `3`].)

We can use the `id()` function to see what actually happened. (Notice that by using `id()` we will create a **statement**, that is an instruction of Python to perform a particular operation. These differ from **expressions** [e.g., `1 + 1`] which we have already seen above.) This function tells us the ID of an object in Python (remember that *everything* is an object):

In [3]:
id(a)

140519284093512

In [4]:
id(b)

140519284093512

Using the `id()` function shows that both `a` and `b` are indeed names that point to the same object in memory. Hence, if we wanted to create an actual copy of `a`, we would have to first import the `copy` module and then use the `copy()` function:

In [5]:
import copy
b = copy.copy(a)
id(b)

140519281850888

In addition to a visual comparision of the IDs we can also ask Python whether two objects are the same. One way would be to use the `==` operator to compare the *values* returned by the `id()` function along the lines of `id(a) == id(b)` (i.e. comparing the two IDs of `a` and `b`). Alternatively, we can also perform this comparision using the `is` operator which compares two *objects* instead of their values:

In [9]:
a is b

False

## Data formats



## Functions, conditions and loops

