## Introduction to Python and Jupyter Notebooks

**Instructor**: Lisa-Marie Rolli, Volkamer Lab, Saarland University (lisa-marie.rolli@uni-saarland.de)


**Course date**: 22nd September 2025


Based on previous contributions by Corey Taylor, Jan Philipp Albrecht, and Jaime Rodríguez-Guerra.

### 1 Python and Jupyter Notebooks
#### What is Python?

Python is a widely used general-purpose high-level programming language.
The term "high-level" means that the language has abstracted away most of the technical details and manages them for you (e.g. memory allocation).

In this course we will use Python 3.12.11 (default in Colab as of August 2025).

#### What is this interface?

This web-based interface  is a so called **Jupyter notebook**, a web application that allows you to create and share documents which contain executable code, equations, visualizations and explanatory text. 

It allows you to define so called *cells* of different formats:

- Rich-text or Markdown cells (like this cell)
- Code cells

You can *run* (execute) each cell seperately by pressing <kbd>Shift</kbd>+<kbd>Enter</kbd> or <kbd>Ctrl</kbd>+<kbd>Enter</kbd>. All names you define (variables, functions...) are available for **both** preceding _and_ following cells once executed. Careful with the order in which you execute your cells!

Some code cells additionally produce an output (text, images, etc.). In that case, the output will appear underneath the code cell.


> __Exercise__
>
> Try to execute the following cell and make sure the output appears.

In [None]:
print("This sentence will appear underneath the cell.")
# This is a comment line. 
# Lines in a code cell starting with the '#' symbol are
# ignored when you run a piece of code

# print('This sentence will NOT appear underneath the cell!')

### 2 Variables and Operations

There are several **types** of objects depending on their contents.

In the following cells, you will learn about:

- scalar (one element) objects:
    - integers (`int`)
    - boolean (`bool`)
    - floating-point numbers (`float`)
    - strings (`str`)
- collection objects:
    - lists (`list`)
    - dictionaries (`dict`)

#### Assignment

Regardless of the type, assignment can be done with the following syntax:

```python
name = value
```

You can think of _names_ as a label hanging from a _value_. As with real objects, you can attach several names to the same value:

In [None]:
color = "pink"
print(color)

Two things to note:

1. The _type_ of the value will be implicitly inferred from the value _contents_.
2. You cannot use spaces in the name. Python style guide recommends using `lowercase_words_separated_by_underscores`.

#### Collection objects

##### lists

Python has a datatype called `list`, where other variables (regardless of their type) can be positionally stored. In other words, `list`s are a sorted collection of elements. 

The syntax for list definition uses square brackes `[ ]`, which surround the values separated by commas `,`:

The code `my_list = [5, 10]` would therefore store `5` at first position, and `10` at second position.

  
> *Exercise:*<br>
> *Try to:*
> - *define a list called `animals`*
> - *at the first* **position**, *store the string `"horse"`.*
> - *store the string `"spider"` TWICE at* **position** *2 and 3.*
 

In [None]:
# space to solve the exercise

Once defined, you can access the contained elements in different ways. The most common way is by using the **index** of the element. This is the position number, but the count starts from `0`.

In [None]:
my_list = [5, 10, 15]
print(my_list[0])

Note that the `index` itself could be a variable of the type `int`!

In [None]:
index = 1
print(my_list[index])

In [None]:
my_list[3]

***

Oh! Was that an error? 

<font color=red>Setting or accessing a value of an index of a list which is not defined results in an error!</font>

```python
my_list[3]
---------------------------------------------------------------------------
IndexError                       Traceback (most recent call last)
<ipython-input-9-831b15cbf272> in <module>()
----> 1 my_list[3]

IndexError: list index out of range

```

#### Operations

Every object type (`int`, `str`, etc) defines some **operations** to do basic tasks.

For example, a variable of type `int` defines, among others, the following operations:
- Addition `+`
- Substraction `-`
- Multiplication `*`
- Division `/`


A division is allowed but **changes** the type of the _returned_ variable (from `int` to `float`). 


In [None]:
print(3 + 4)  
print(3 - 4)  
print(3 * 4)  
print(3 / 4)  

In addition to mathematical operations, you can apply logical operations, like **comparisons**.

An **equality operation** (a comparison operation) can be performed by using the `==` operator. The result is always a `bool`-type having either the value `True` or `False`, depending on whether the content of two variables are indeed equal or not. Beside equality, we can also check for inquality be using the `!=` operation.

Imagine this as a question you ask to the computer. "Is the content of var1 **equal** to the content of var2?"

- Equality `==`
- Inequality `!=`


In [None]:
var1 = 'hello'

In [None]:
print(var1 == var1) 
print(var1 != var1) 

Note how comparisons _always_ produce a `bool` type!

Instead of equality (`==`) and inequality (`!=`), there are other comparison operations which return a `bool`-type value. Think of it as other questions to the computer than only asking it for equality. 
- `<` strictly less than
- `>` strictly greater than
- `<=` less than or equal
- `>=` greater than or equal


### 3. Control Flow

As any programming language python gives flow control possibilities: `if, else, for, while`.

#### Decisions: if, elif, else

Conditionals allow you to you easily define automated decisions.

We could have a piece of code checking the number of cases in our neighborhood every day so we can get an alert if the number of cases exceeds a certain threshold. For example:


In [None]:
if 3 > 4:
    print("Something is wrong!")
    # here, more lines could follow
else:
    # If the `if` clause is not true, then this block gets executed instead.")
    print("3 <= 4")

# These lines are not indented anymore.
# They will be executed regardless of the value of variable 'status'. 
print("After if")  


Notice the syntax: 
* Keyword `if` indicates the start of an ` if`-statement. 
* A **conditional expression** (here `3 > 4`) follows, which will return (explicitly or implicitly) a `bool`-type. Thus, either `True` or `False`. 
* A `:` closes the ` if`-statement. 
* An **indented** block follows. All contiguous lines sharing the same level of indentation (or deeper) belong to the same block and are executed only if the condition is met.
* After the `if` block, an `else` block is defined. This is optional!

Now further think of a scenario, where we have to decide whether to run a certain code based on 3 or more different conditions. Alternative, exclusive conditions can be done with `elif`.

In [None]:
a = 3
b = 4
if a > b:
    print(f"{a} > {b}")
    # here, more lines could follow
elif a == b:
    print(f"{a} = {b}")
    # here, more lines could follow
else:
    # If the `if` clause is not true, then this block gets executed instead.")
    print(f"{a} < {b}")


#### Loops

In [None]:
cities = ["Paris", "Saarbrücken", "Cambridge", "Barcelona"]
for city in cities:
    print(f'{city} is really beautiful!')

What's happening here?

1. `for NAME in COLLECTION:`. Just as `if`, a `:` is needed to end the line.
2. `NAME` will be assigned to the first element in `COLLECTION`.
3. The indented block(s) will be executed with `NAME = FIRST VALUE`.
4. `NAME` will take the second element.
5. The indented block(s) will be executed with `NAME = SECOND VALUE`
6. ... and so on. The loop ends when there are no more elements in the list to assign.

> **Exercise**
>
> How many entries does `cities` contain? You can use `len(cities)` to guess the answer, but you can also compute it with a `for` loop, a reassignable `int` variable and the `+` operator.

In [None]:
# space for the exercise

Note: There are more ways to _repeat actions_ in Python, like using `while` loops or recursion, but `for` is by far the most common!

### 4.4 Introduction to functions

As you have seen, repeating a single operation is useful so you do not get bored writing the same lines again and again. In fact, avoiding the repetition of tasks is one of the main core concepts in programming! We are lazy and we want to do as little as possible. This leads to devising mechanisms to reuse the same code in a flexible way.

This is what functions are for: **reusable pieces of code that can be parametrized**. This means they accept variables as arguments! They can optionally _return_ a result too. They are very similar to mathematical functions in that sense!


In [None]:
# Define a function
def square(number):
    output = number ** 2
    return output

# Use the function
result = square(2)
print(result)

Functions can be understood as tasks. A chef might be asked to perform a task, like "cutting vegetables". The function would contain the instructions to know **how** to move the knife, but not **what** to cut, since there is more than one vegetable for which this instruction is valid. You then give for example a carrot to the cook (call the function with a "carrot" as argument) and get the cut carrot back.

Note that:
1. Defining a function does not mean you are using or _calling_ the function. 
2. If there's a return value, you also need to collect it. Otherwise it will be produced and discarded. Think of the cook you asked to cut the carrot. He did it properly and hands you the cut carrot, but you don't take it.

So, as a recap, in order to use a function, you first have to know:

1. If the function **exists**, either because you have written it or because you have imported it from another file (see below).
2. What **arguments** or **parameters** the function is expecting. A function can take arguments or not. Arguments can be required (positional arguments) or optional (keyword arguments).
3. If the function **returns** something. If it does not return anything, it doesn't mean it didn't _do_ anything. It might have some kind of side effect (writing a file to disk, for example).


>*Exercise:*
*You already know and even used one particular function. Do you know how it is named and what it does?*

### 4 Modules and imports

Python let you reuse _objects_ defined in other _modules_ via the [`import` system](https://docs.python.org/3/reference/import.html). A module is a file containing Python definitions and statements. You can easily identify them thanks to the `*.py` file extension.

> More info on the [Python documentation](https://docs.python.org/3/tutorial/modules.html)!

Python ships with a large battery of modules that you can use right away. Since it would be unfeasible to load all definitions directly when starting Python (it simply would take too long and would consume too much memory), only a very small subset is available upon initialization (like `len()`, `print()`, `range()`). The other ones must be _explicitly_ invoked or **imported**. That can be achieved with the `import` keyword.

For example, to get access to more scalar functions like the square root, you need to import the `math` module, which defines `sqrt`. As you can see, the functions defined in `math` can be accessed via `.`.

In [None]:
import math
help(math.sqrt)

### 5. Important Python Libraries

We already saw how to import Python libraries. Libraries are a collection of functionalities that enable you to perform many common tasks without writing the whole code yourself from scratch.

For instance, the `pandas` library provides the function `read_csv()` to read a comma-separated values (csv) file into a so-called `DataFrame`.


Now, we have a deeper look into the python libraries 
* `pandas` and 
* `numpy` 

In [None]:
import numpy as np
import pandas as pd

Since module names are sometimes long (and informaticians are lazy) we can alias them by using the keyword `as`. Whenever we need a function/definition from the `numpy` module we can further refer to the module as `np`.

In [None]:
# calling the function "mean" from the numpy module, giving the list "[1,2,3,4]" as argument.
print(np.mean([1,2,3,4]))


**Tip**: You can check out available functionalities of a library in this Jupyter notebook, by writing the library name followed by a dot and then hitting the tab key. All available functionalities will pop up for you to explore. Since there are a lot of options, you can narrow it down by writing e.g. `read` while the popup windows is up. 

**Note**: If you are working in in Google Colab you will first have to disable `Automatically trigger code completions` on `Tools` > `Settings` > `Editor` in order to be able to use this feature.

See for yourself all the possible file formats that you can read with `pandas`.

We will now use the `read_csv()` function ([see docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)) to load the csv file content as `DataFrame` into the variable `smiles_df`.

In [None]:
smiles_df = pd.read_csv("data/smiles.tsv", sep='\t')
smiles_df.head()

In [None]:
activity_to_ER = pd.read_csv('data/binary_activity.tsv', sep = '\t')
activity_to_ER.head()

We see that both have the same compound IDs. Now, we want to join the dataframes to one larger one along their compound ID column.

In [None]:
joined_df = smiles_df.set_index('DTXSID', drop = True).join(activity_to_ER.set_index('DTXSID', drop = True))
joined_df.head()

## 6. Exercises

6.1 When executing the following code, (why) will there be an error?

```python
x = 5
y = "5"

print(x + y)

```

space for solution

6.2 What do we need to change it order to get the correct output "10"?

In [None]:
# space for solution

6.3 What is the expected output of the following code?
```python
x = ""
for i in "python_is_amazing!":
    if i == "_":
        x += " "
    else:
        x += i
print(x)
```

space for solution

6.4 Read the file `data/continuous_response.tsv`. Now, join the continuous AC50 values with the `joined_df`.

In [None]:
# space for solution

## 7. Solutions

Before you take a look at the solutions, try to solve the exercises yourself! Talk to your fellow students. If you have a solution, then go ahead and take a look here.


<details><summary>Exercise 6.1</summary>

Integers and strings cannot be summed or concatenated together, so this will result in an error. In order to use `+` you have to decide the behavior you want by converting one of them to the expected type.
    
</details>


***


<details><summary>Exercise 6.2</summary>
   
```python
# Without changing the assignment, use `int(  )` to cast string to integer
print(x + int(y))    
```
    
</details>

***

<details><summary>Exercise 6.3</summary>

The solution would be:
    
```python
"python is amazing!"

```

Why? Because we are iterating over the string contents, letter by letter, and adding them to a new one (`x`) one letter at a time. However, before adding to `x` we first chechk if the letter is a `_`, and in that case we replace it with a space ` `. In practice, this means we are replacing underscores with spaces. Of course there are simpler ways to do such a common operation! 
    
```python
x = "python_is_amazing!".replace("_", " ")
```
    
</details>

***

<details><summary>Exercise 6.4</summary>

The solution would be:
    
```python
continuous_response = pd.read_csv('data/continuous_response.tsv', sep ='\t')
full_joined_df = joined_df.join(continuous_response.set_index('DTXSID', drop = True))

```
    
</details>