# Teaching Python <img style="text-align:center" src="python.png" alt="Alt text" width="50" height="30" /> 

## Day 1: _The Basics_

**Frederic R. Hopp**  
**@freddy_hopp | @fhopp.bsky.social**  
**https://fhopp.github.io/**

**13.11.2023**

Amsterdam School of Communication Research

***

# Agenda 

## 1. Introductions

## 2. Why Python? 

## 3. Python Basics

## 4. Exercises

## ~ BREAK ~

## 5. Teaching Python

## 6. More Exercises

***

# Introductions 

## Who are you?
## Why are you here?
## What is your Python/R/SPPS/Excel background?

# Why Python? 

Science should be _transparent_ and _reproducible_ by **anyone**...

We need tools that are
- platform-independent
- free (as in beer and as in speech, gratis and libre)
- which implies: open source
 
This ensures our research 
- (a) can be reproduced by anyone
- (b) no black box that no one can look inside.
    - ongoing open-science debate! ([van Atteveldt et al., 2019](https://ijoc.org/index.php/ijoc/article/view/10631))

![CCR](ccr_logo.png)

**[Vis (2013)](https://firstmonday.org/ojs/index.php/fm/article/view/4878)**

"[...] these [commercial] tools are often unsuitable for academic purposes because of their cost, along with the problematic `black box' nature of many of these tools."

**[Mahrt and Scharkow (2013)](https://www.tandfonline.com/doi/abs/10.1080/08838151.2012.761700)**

" [...] we should resist the temptation to let the opportunities and constraints of an application or platform determine the research question[...]"

***

# Python 

## A language, not a program

# What? 

A language, not a program  

Huge advantage: flexibility, portability, (backward)compatability

One of _the_ languages for data science. (The other being R)

- But: Python is (a lot!) more versatile. R for stats, Python for everything else. 

# Which version? 

We use Python 3. 

Ask me about virtual environments...

***

# Jupyter Lab

- A program to run Julia, Python, and Ruby (JuPytR)
- Compare with RStudio (but more lightweigth)
- At the heart: Notebooks (cf. R)
    - Run and document code
    - Create Slides
    - ...
- File browser, text editor, data viewer, ...

**Let's start JupyterLab!**

# Navigating Notebooks

The keyboard is your friend. The trackpad/mouse not. 

_Hands on:_

Create a new notebook in jlab and open it..
1. Hit `ESC` to enter "cell selection" mode
2. Hit `A` to create new cell ABOVE current one
3. Hit `B` to create new cell BELOW current one
4. Hit `DD` to delete current cell
5. Use `UP` and `DOWN` arrow keys to navigate between cells
6. Hit `ENTER` to enter "cell editing" mode
7. When in "cell selection" mode (see #1) hit `M` to create a `Markdown` cell (for text, headings, etc. more later)
8. `SHIFT`+`ENTER` **executes** a cell. Windows: `Strg`+`ENTER` should do it. 

# Python Lingo

In [3]:
# Basic datatypes (variables)
a_int = 37
a_float = 1.5
a_bool = True
a_string = "Python"
a_variable = "a_variable" # a_variable and "a_variable" is not the same!

print(type(a_int))
print(type(a_float))
print(type(a_bool))
print(type(a_string))

<class 'int'>
<class 'float'>
<class 'bool'>
<class 'str'>


a_variable and "a_variable" is not the same! 

"5" and 5 is not the same.

- But you can transform it: int("5") will return 5.

You cannot calculate 3 * "5"

- (In fact, you can. It's "555").

But you can calculate 3 * int("5")

In [2]:
# More advanced datatypes
## lists []
firstnames = ['Alice','Bob','Cecile'] 
lastnames = ['Garcia','Lee','Miller']
ages = [18,22,45]

# dictionary {x:y}
agedict = {'Alice': 18, 'Cecile': 45} 

In [3]:
# Retrieving items
print(firstnames[0]) # gives you the first entry
print(firstnames[-2]) # gives you the one-but-last entry 
print(firstnames[:2]) # gives you entries 0 and 1
print(firstnames[1:3]) # gives you entries 1 and 2 
print(firstnames[1:]) # gives you entries 1 until the end
print(agedict["Alice"]) # gives you 18

Alice
Bob
['Alice', 'Bob']
['Bob', 'Cecile']
['Bob', 'Cecile']
18


In [4]:
# Less frequent, more useful datatypes
## Sets 
a_set = {1,2,3} # collection of unique items
duplicates = [1,2,2,3,3,4]
no_duplicates = set(duplicates)
print(no_duplicates)
## tuple ()
a_tuple = (1,2,2,3) # like list, but immutable (cannot be changed) 

# ... many more later! 

{1, 2, 3, 4}


# Functions
Remember $f(x) = y$? 

Take an input (x) "do sth to x" and return result (y)

Functions need to be "called" -> ()

E.g. len([1,2,3]) returns 3. 

# Some functions 
- len(x) # returns the length of x
- y = len(x) # assign the value returned by len(x) to y
- print(len(x)) # print the value returned by len(x)
- print(y) # print y
- int(x) # convert x to an integer
- str(x) # convert x to a string
- sum(x) # get the sum of x

In [5]:
# How could you print the mean of a list with those functions? 
# your_code_here

In [6]:
# sum(x) / len(x)

In [10]:
# Write your own function 
def addone(x):
    y = x + 1
    return y 

Reminder: Functions take some input (argument)  
(in this example, we called it x) and return some result.

In [12]:
# Thus, running:
addone(10)
# returns: 

11

In [16]:
my_string = "SCREAM"

In [23]:
my_string = my_string.lower()

In [24]:
my_string

'scream'

# Methods

Similar to functions, but directly associated with an **object**. 

Everything in python is an **[object](https://linux.die.net/diveintopython/html/getting_to_know_python/everything_is_an_object.html#d0e4665)**. 

Methods are typically accessed with the "." operator:
- "SCREAM".lower() returns the string "scream"
- a_list.append("hi") adds "hi" to a list
- ...

Both functions and methods end with ().  
Between the (), `arguments` can (sometimes have to) be supplied.

## Some String Methods
mystring = "Hi! How are you?"  
mystring.lower() # return lowercased string (**doesn't change original!**)  
mylowercasedstring = mystring.lower() # save to a new variable  
mystring = mystring.lower() # or override the old one  
mystring.upper() # uppercase  
mystring.split() # Splits on spaces and returns a list  
['Hi!', 'How', ' are', 'you?']

=> **You can use TAB-completion in Jupyter to see all
methods (and properties) of an object!**

# Modifying lists

Let's use one of our first methods! Each list has a method .append():

In [9]:
mijnlijst = ["element 1", "element 2"]
anotherone = "element 3" # note that this is a string, not a list!
mijnlijst.append(anotherone)
print(mijnlijst)

['element 1', 'element 2', 'element 3']


# Merging two lists (=extending)

In [10]:
mijnlijst = ["element 1", "element 2"]
anotherone = ["element 3", "element 4"]
mijnlijst.extend(anotherone)
print(mijnlijst)

['element 1', 'element 2', 'element 3', 'element 4']


# Modifying Dicts

Adding a key to a dict (or changing the value of an existing key)

In [11]:
mydict = {"whatever": 42, "something": 11}
mydict["somethingelse"] = 76
print(mydict)

{'whatever': 42, 'something': 11, 'somethingelse': 76}


**If a key already exists, its value is simply replaced.**

***

# Structuring our Program

If we want to repeat a block of code, exectute a block of code only under specific conditions, or more generally want to structure our code, we use **indention**.

Indention: The Python way of structuring your program:
- Your program is structured by TABs or SPACEs.
- Jupyter (or your IDE) handles (guesses) this for you, but make sure to not interfere and not to mix TABs or SPACEs!
- Default: four spaces per level of indention

## Structure
A first example of an indented block in this case, we want to _repeat_ this block:

In [28]:
agedict = {'Zeus': None, 'Denis': 96, 'Alice': 18, 
           'Rebecca': 20 , 'Bob': 22, 'Cecile': 45}
myfriends = ['Alice','Bob','Cecile']

print ("The names and ages of my friends:")

counter = 0
for buddy in myfriends:
    print("I am in the loop:", counter)
    print(buddy)
    counter += 1
    # print(f"My friend {buddy} is {agedict[buddy]} years old")

The names and ages of my friends:
I am in the loop: 0
Alice
I am in the loop: 1
Bob
I am in the loop: 2
Cecile


## What happened here?

In [13]:
for buddy in myfriends:
    print (f"My friend {buddy} is {agedict[buddy]} years old")

My friend Alice is 18 years old
My friend Bob is 22 years old
My friend Cecile is 45 years old


### The for loop: 
1. Take the first element from myfriends and call it buddy (like
buddy = myfriends[0]) (line 1)
2. Execute the indented block (line 2, but could be more lines)
3. Go back to line 1, take next element (like buddy = myfriends[1])
4. Execture the indented block . . .
5. . . . repeat until no elements are left . . .

### The f-string

If you prepend a string with an f, you can use curly brackets {} to insert the value of a variable. 

***

The line _before_ an indented block starts with a `statement` indicating what
should be done with the block and ends with a `:` 

#### More in general, the : + indention indicates that

- the block is to be executed repeatedly (`for` statement) e.g., for each element from a list, or until a condition is reached (`while` statement)
- the block is only to be executed under specific conditions (`if`, `elif`, and `else` statements)
- an alternative block should be executed if an error occurs in the block (`try` and `except` statements)
- a file is opened, but should be closed again after the block has been executed (`with` statement)

### Can we also loop over dicts?

Sure! But we need to indicate how exactly: 

In [14]:
mydict = {"A":100, "B": 60, "C": 30}

for k in mydict: # or mydict.keys()
    print(k)

print('---')
for v in mydict.values():
    print(v)

print('---')
for k,v in mydict.items():
    print(f"{k} has the value {v}")

A
B
C
---
100
60
30
---
A has the value 100
B has the value 60
C has the value 30


### if statements

#### Structure
Only execute block if condition is met

In [15]:
x = 5
if x <10:
    print(f"{x} is smaller than 10")
elif x > 20:
    print(f"{x} is greater than 20")
else:
    print("No previous condition is met, therefore 10<={x}<=20")

5 is smaller than 10


#### Can you see how such an if statement could be particularly useful when nested in a for loop?

### try/except

**Structure**  
If executed block fails, run another block instead

In [16]:
x = "5"
try:
    myint = int(x)
except:
    myint = 0

Again, more useful when executed repeatedly (in a loop or function):

In [17]:
mylist = ["5", 3, "whatever", 2.2]
myresults = []
for x in mylist:
    try:
        myresults.append(int(x))
    except:
        myresults.append(None)
print(myresults)

[5, 3, None, 2]


### List comprehensions

**Structure**  
A for loop that .append()s to an empty list can be replaced by a _one-liner_: 

In [18]:
mynumbers = [2,1,6,5]
mysquarednumbers = []
for x in mynumbers:
    mysquarednumbers.append(x**2)

is equivalent to:

In [19]:
mynumbers = [2,1,6,5]
mysquarednumbers = [x**2 for x in mynumbers]

Optinally, we can have a condition:

In [20]:
mynumbers = [2,1,6,5]
mysquarednumbers = [x**2 for x in mynumbers if x>3]

### List comprehensions

**A very pythonic construct**
- Every for loop can also be written as a for loop that appends to a new list to collect the results.
- For very complex operations (e.g., nested for loops), it can be easier to write out the full loops.
- But mostly, list comprehensions are really great! (and much more concise!)

**⇒ You really should learn this!**

***

# ☕ 15 Minute Break ☕

***

# Exercises 

## Exercise 1: Working with lists

### 1. Warming up

- Create a list, loop over the list, and do something with each value (you're free to choose). 

### 2. Did you pass?

- Think of a way to determine for a list of  grades whether they are a pass (>5.5) or fail.
- Can you make that program robust enough to handle invalid input (e.g., a grade as 'ewghjieh')?
- How does your program deal with impossible grades (e.g., 12 or -3)?
- Any other improvements?

In [21]:
grades = [4, 7.8, -3, 3.6, 12, 9.1, "4.4", "KEGJKEG", 4.2, 7, 5.5]

for grade in grades:
    try:
        grade_float = float(grade)
        if grade_float >10:
            print(grade_float,'is an invalid grade')
        elif grade_float <1:
            print(grade_float,'is an invalid grade')
        elif grade_float >= 5.5:
            print(grade,'is a PASS')
        else:
            print(grade,'is a FAIL')

    except:
        print('I do not understand what',grade,'means')

4 is a FAIL
7.8 is a PASS
-3.0 is an invalid grade
3.6 is a FAIL
12.0 is an invalid grade
9.1 is a PASS
4.4 is a FAIL
I do not understand what KEGJKEG means
4.2 is a FAIL
7 is a PASS
5.5 is a PASS


## 3. Working with dictionaries 

**Instructions**:
You are creating a program to conduct a media preferences survey among a group of friends. The survey will gather information about the monthly movie-watching habits, daily music-listening hours, and monthly book-reading habits of each friend.

**Steps**:
1. Enter the number of friends participating in the survey.
2. For each friend, enter their name and answer the questions about their media preferences.
3. The program will store the information in a dictionary.
4. After collecting the data, the program will analyze and present a summary of each friend's preferences.
5. Finally, the program will calculate and display the average media preferences of the group.

### Tips: 

#### Initialize an empty dictionary to store friends' media preferences  
media_preferences = {}

#### The input function:  
num_friends = input("Enter the number of friends participating in the survey: ")

#### Looping over a range of values
```
for i in range(1, num_friends + 1):  
    friend_name = input ...  
    ...
    media_preferences ...
```

In [22]:
"""
# Initialize an empty dictionary to store friends' media preferences
media_preferences = {}

# Gather information about friends' media preferences
num_friends = int(input("Enter the number of friends participating in the survey: "))
for i in range(1, num_friends + 1):
    friend_name = input(f"Enter the name of friend #{i}: ")
    movies = int(input(f"How many movies does {friend_name} watch in a month? "))
    music = int(input(f"How many hours of music does {friend_name} listen to in a day? "))
    books = int(input(f"How many books does {friend_name} read in a month? "))

    # Store the information in the dictionary
    media_preferences[friend_name] = {'Movies': movies, 'Music': music, 'Books': books}

# Analyze and present the data
print("\nMedia Preferences Summary:")
for friend, preferences in media_preferences.items():
    print(f"\n{friend}'s Preferences:")
    print(f"Movies: {preferences['Movies']} per month")
    print(f"Music: {preferences['Music']} hours per day")
    print(f"Books: {preferences['Books']} per month")

# Calculate and display average preferences
total_movies = sum(preferences['Movies'] for preferences in media_preferences.values())
total_music = sum(preferences['Music'] for preferences in media_preferences.values())
total_books = sum(preferences['Books'] for preferences in media_preferences.values())

average_movies = total_movies / num_friends
average_music = total_music / num_friends
average_books = total_books / num_friends

print("\nAverage Preferences:")
print(f"Movies: {average_movies} per month")
print(f"Music: {average_music} hours per day")
print(f"Books: {average_books} per month")
"""

'\n# Initialize an empty dictionary to store friends\' media preferences\nmedia_preferences = {}\n\n# Gather information about friends\' media preferences\nnum_friends = int(input("Enter the number of friends participating in the survey: "))\nfor i in range(1, num_friends + 1):\n    friend_name = input(f"Enter the name of friend #{i}: ")\n    movies = int(input(f"How many movies does {friend_name} watch in a month? "))\n    music = int(input(f"How many hours of music does {friend_name} listen to in a day? "))\n    books = int(input(f"How many books does {friend_name} read in a month? "))\n\n    # Store the information in the dictionary\n    media_preferences[friend_name] = {\'Movies\': movies, \'Music\': music, \'Books\': books}\n\n# Analyze and present the data\nprint("\nMedia Preferences Summary:")\nfor friend, preferences in media_preferences.items():\n    print(f"\n{friend}\'s Preferences:")\n    print(f"Movies: {preferences[\'Movies\']} per month")\n    print(f"Music: {preferences

***

# 🍲 Lunch Break until 13:15 🍲

***

# Teaching Python 🐍

## Installation 
- JupyterLab Desktop (what we use)
- Anaconda (A suite of programs, including jlab, R, ...)
    - Pro: Virtual environments, more flexibility
    - Con: Large installation, slow on older machines
- [Google Collab](https://colab.research.google.com/)
    - Pro: No install; run code in "cloud"; can be run on all machines
    - Con: Gdrive account needed; mounting data not trivial, ...

- JupyterHub (not currently implemented at ASCoR, but my ideal solution...)
    - Server for multiple users, each with own environment
    - Try it [here](https://jupyter.org/try-jupyter/)



# General Tips

1. Make it project/goal-oriented and personally meaningful (=fun).
    - Basics are required, but get hands-on quickly!
    - Let students choose their own project within limited scope
    - Create milestones that lead towards goal

2. Patience is a virtue
    - Stress steep learning curve
    - ... but also fast results!
  
3. Create an integrative learning environment
    - Limit extra software, but keep it centralized (more soon!)
    - If you have TAs: Offer code reviews
  
4. Programming as a way of thinking
    - Structured problem solving
    - Recognize complex problem, try to break it into individual steps
    - OR: think of simplest case/version of that problem and work upwards 

***

# Jupyter Presentations 

Using Jupyter, you can turn any notebook into an (interactive) presentation. 

This can be helpful to keep everything centralized (text + code), and you can even execute code while in presentation mode. 

### Exercise: Create a new notebook and turn it into a presentation. 

**Note.** Click on the top-right "Settings icon", then `common tools` and then modify the `Slide Type` of a cell. 

- Slide: A new slide
- Sub-slide: New slide "below" the current one
- Fragment: Appears as a new "cell" on your current slide 

To turn the notebook into a presentation, we need a particular javascript package, called reveal.js

Run the following commands in your notebook to download it into the same folder as your .ipynb  
Windows users have to start these commands with a `/`. 

`git clone https://github.com/hakimel/reveal.js/`

Next, run the following to convert your your_notebook.ipynb into a presentation:

`!jupyter-nbconvert --to slides your_notebook.ipynb --reveal-prefix=reveal.js` 

And finally, launch your presentation:

`!jupyter-nbconvert --to slides your_notebook.ipynb --post serve` 


# Running code live with RISE 

To get even more out of this, install RISE: 

`pip install jupyterlab_rise` 

Restart jlab, open a notebook (that contains a code cell), and click on the little presentation icon on the top right to start rise. 

It should now be possible to execute code "live" in the presentation. 


# Jupyter Book

For even more centralization, you can structure all your materials in a `jupyter book`. 

[Jupyter book](https://jupyterbook.org/en/stable/intro.html) allows you to create a "website" where you can host all your notebooks, course materials, etc. 

As an example, check out our [Data Journalism](https://fhopp.github.io/data_journalism/content/intro.html) book. 

Another great feature of jupyter books are that students can use hypothesis.io to highlight text/ask questions in a notebook. 

**=> Creating a book takes some time, but it sure is worth the effort.** 

***

# More Python! 

I'd like to introduce you to two more concepts to prepare you for the next sessions. 

1. Modules
2. Paths and Files 

## Modules

Modules are essentially "software packages" (think: Word, PPT, ...) designed for various tasks.  
You will learn the common ones in this course, but for now, lets focus on their basic usage.

To be used, modules always need to be imported (and sometimes installed, stay tuned!):

In [23]:
# os is useful for various "operating system" related things
import os 

# Show current working directory
pwd = os.getcwd()
print('This code is run here:', pwd)


This code is run here: /Users/fhopp/Library/Mobile Documents/com~apple~CloudDocs/FRH/Science/ASCoR/Teaching/TTP


In [24]:
# Sometimes you want to measure how long a certain operation takes
import time 

now = time.time()
# How long does it take Python to loop 100 million times?
for i in range(0,100000000):
    continue
end = time.time()
total_time = round((end-now),3)
print(f"This cell took {total_time} seconds to run.")

This cell took 2.165 seconds to run.


In [25]:
# To install a module/package, you can use pip 
# pip  install pandas # uncomment to run

# Paths

Apologies in advance, this one is tricky. 

When we want to load files (e.g. a csv) into Python to work with it, we have to tell Python where this file is located on our machine. 

As an example, let's create a simple text file using jlab and write a few things into it. 

Say we named it `my_file.txt` and placed it in the **same** folder as our notebook. If this is the case, we can simply open it like so:

In [28]:
my_file = open("my_original_file.txt")
# Read it (a method for .txt files)
my_file = my_file.read()
# Print the content
print(my_file)

This is my file. 

It is empty. Just like me. 


Now create a folder and place the file in it.

In [29]:
# This will now throw an error because the path to the file is no longer correct!
# my_file = open("my_file.txt")

# Instead, we need to add the folder to the file path (note the / to indicate folders)
my_file = open("files/my_file.txt")

Finally, should your files reside in a galaxy far, far away, it is best to get the whole path to the folder, and keep the path as a variable:

In [30]:
DATA_PATH = "/Users/fhopp/Library/Mobile Documents/com~apple~CloudDocs/FRH/Science/ASCoR/Teaching/TTP/files"

# You can then load all files in that folder using os module
my_files = os.listdir(DATA_PATH)

***

# How may for-loops be helpful for loading files?

***

# Which questions do you have? 

***

In [None]:
#git clone https://github.com/hakimel/reveal.js/
#!jupyter-nbconvert --to slides introduction.ipynb --reveal-prefix=reveal.js
#!jupyter-nbconvert --to slides introduction.ipynb --post serve