<a href="https://colab.research.google.com/github/worldbank/dec-python-course/blob/main/1-foundations/1-types-and-syntax/foundations-s1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DEC Foundations to Python - Session 1
# Variable types and Python syntax

 

# S0.0 - Introduction

## S0.1 - The building blocks of Python

#### The Atoms of Python
In this session we will cover the basic building blocks that is what everything in Python is made of. **Think of these building blocks as the atoms of Python**. 

There are 5 types of atoms in Python you are likely to ever interact with as a Data Scientist. 

<img src=https://upload.wikimedia.org/wikipedia/commons/6/6f/Stylised_atom_with_three_Bohr_model_orbits_and_stylised_nucleus.svg width="200">

#### The Containers of Atoms of Python
To organize and give structure to these atoms, there are 4 types of containers they can be stored in. Without these Python would just be a soup of atoms. Think of these as the chemical bonds between atoms in Python.

<img src=https://eepratibha-gallery.s3.ap-south-1.amazonaws.com/uploadimages/Tenthclass-humaneye-05-01-2021-118.jpg width="300">

#### The Molecules of Python
These atoms and containers are combined into something called objects. **We can think of these objects as molecules**. 

Object can be simple or very complex. But no matter how complex something you ever encounter in Python is, it can always be traced back to a combination of the atoms and their containers.

With these atoms and molecules, we can make everything from databases, machine learning algorithms, natural language projects or whatever you will end up using Python for. 

<img src=https://upload.wikimedia.org/wikipedia/commons/e/e8/Sucrose_molecule_3d_model.png width="300">


## S0.2 - Do I really need to care about these building blocks?

Maybe you are thinking right now: 

"**_I'm am not super-techy and never intend to develop my own custom data structures. I just want to use Python for some cool data science. So why is this person talking about inner fabric of Python and CHEMISTRY!?!_**".

If you load some data into a dataset in Python, it will be stored in an object. While you do not need to understand the chemical make-up an object to use with it, it will expect that you to provide inputs using atoms and containers when you want to modify it, analyze the data in it etc.

And after any operation it almost always returns the result in terms of a Python atom or container of atoms that you need to know how to identify and handle. But identifying and handling these atoms and containers does not require that you are an expert in them.

---

#### The scope of this session
This session will show how to identify the 5 atoms and the 4 containers, and the basics of how to interact with them. But we will focus most of time to 3 of the atoms and 2 of the containers.

After this session you will know enough to be able to use them in relation to other objects in Python.

You will see how we will come back to these basic types when we in the following sessions interact with Python molecules developed for us by other users.

---

## S0.3 - Google Colab

<img src=https://miro.medium.com/max/986/1*pimj8lXWwZnqLs2xVCV2Aw.png width="500">

Click this link to open the file you are currently viewing in Google Colab: https://colab.research.google.com/github/worldbank/dec-python-course/blob/main/1-foundations/1-types-and-syntax/foundations-s1.ipynb.

This will open an exact copy of this file in Google Colab. Since it is a copy, you can make edits in it without it affecting anyone else file. Through this course we expect you to always open the file for each session in Colab and follow along.

### What is Colab?

* It's like Google Docs but for Python code
* Requires no installing of Python itself or common libraries (add-ons)
* Runs on a Google server. Any files you save in Colab are saved in your Google Drive - so a very bad place for sensitive data
* Unfortunately, you need to be logged in to a Google account to run code on Google Colab.

**Do not use Google Colab for any non-public data**

---

### How to run code in Colab

Jupyter Notebook and Colab is organized in cells. A cell can either be code or text. The only purpose of text cells is to provide information to a human reader. This information can be a few comments to the code, or a full research paper. You can format this text using [markdown](https://commonmark.org/help).

Code cells is where you write your Python code. Next to each code cell there is a play button. You can run the code by either click the 'play' icon or select the cell and hit `SHIFT-ENTER` on your keyboard. 

Try running the cell below that says `2 + 2`.

In [None]:
2 + 2

### What to use for non-public data?

There are alternatives to Google Colab that organize text and Python code in blocks. These are called notebooks. 

**Jupyter Notebooks**. The most common tool to run Python code in notebooks on your own computer is called _Jupyter Notebooks_. You can install _Jupyter Notebooks_ on your computer where you read data and other files directly from your computer such that no files needs to be shared over a server owned by Google. 

On WB computers, the consensus seems to be that the easiest way to install and use Python is by requesting ITS to install Anaconda (https://www.anaconda.com/) for you.

**Databricks**. If you want to have a collaborative space in the cloud that is still approved at the WB for non-public data you can use Databricks. Databricks is also a notebook-styled Python interface. An instance in Databricks can be made more computatonally powerful than what you will ever need. 

In the following sessions you will be given the option to open the sessions in a WB hosted databricks session where you can share non-public data.

# S1.0 - Variables

So atoms, containers and objects are the types of data or information we can have. But we need a way to identify each piece of data or information. For this we use variables.

All variables consist of three things:

* The name of the variable so it can be uniquely identified and accessed
* The "*type*" of the information stored (atom, container or object) 
* The information/data that the variable holds

Variables are only stored in temporary memory, 
so when restarting Python, 
you need to recreate them by running your code again.

(**For Stata users:** In Stata, 
"variable" always means a column in a dataset. 
Variables in Python behave more like a `local` in Stata.)

In [None]:
# Create variables with the name hw and number
text_variable = 'Hello World'
number_variable = 42

In [None]:
# Access the variables with the name hw and number
# and then print the information they store
print(text_variable)
print(number_variable)

Where is a variable saved? Variables are only stored in the RAM memory. RAM memory is very fast, but is cleared each time you restart python. So this memory is only for work-in-progress variables.

If you need to save data in a variable to a file then you need to save to disk memory. This allows the data to be accessed by other programs and the data will still be there next time you start Python.

This is the same no matter if you use Google Colab on a Google server, Jupyter Notebooks on your computer, or Databricks on a World Bank server. We will cover how to save to disk memory later.

# S2.0 The basic data types

These 5 basic types are the types you are ever likely to use:

| Class name | Full name      | Name used       | Usage                             |
|:---        |:---            |:---             | :---                              |
| int        | Integer        | "int"/"integer" | Number without decimal point      |
| float      | Floating point | "float"         | Number with decimal point         |
| str        | String         | "string"        | Text                              |
| bool       | Boolean        | "boolean"       | Either true or false              |
| none       | None           | "none"          | An explicit way of saying nothing |

Any information in Python you will ever interact with is a combination of these types.
This is similar to how tiny simple atoms in real life
can be combined to the most wonderful complex life forms.
This is why we in this training refer to
**the basic data types as the _atoms of Python_.**

## S2.1 Numeric variables

**Define a numeric variable:**

In [None]:
# Assign the value 6 to a variable we name x
x = 6

Now somewhere in memory there is a variable with the name `x` that currently stores the value 6.

We can reference this variable until we explicitly delete it or restart our Python session.

In [None]:
# We can output the value by calling it 
x

**Ex. 1a:** (example excercise - do together)

In [None]:
# Create a variable called ex1_x and set it to the value 5

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex1_x == 5

**Do math using a variable:**

In [None]:
# Take the value in x and output that value plus 1
x + 1

In [None]:
# The value of x is still 6
x

In [None]:
# To update the variable x we need to overwrite it with a new value
# Assign x + 1 to x and output it
x = x + 1  # NOTE: this OVERWRITES the variable x
x

Note that we can only output a variable if it is by itself on the last line in a cell. We will soon learn how to _print_ a variable where this is not the case and where we have more options.

---
**Important error message: NameError**

Whenever you see an error where it says "not defined", as in `NameError: name 'z' is not defined`, then it means that you have tried to reference a variable `z` but that there is no variable with that name.

In [None]:
x = z + 4

**More math and using multiple variables:**

In [None]:
# Reset x to 6
x = 6

In [None]:
# Define a second variable - this time with a longer name
my_long_variable_name = 2

In [None]:
# Adding two variables together
x + my_long_variable_name

In [None]:
# Subtracting x from my_long_variable_name
x - my_long_variable_name

In [None]:
# Multiplying x with my_long_variable_name
x * my_long_variable_name

Here is a table of the most common mathematical operators:

| Symbol | Operation      | Example     |
|:---:   |:---            |:---         |
| +      | Addition       | 6+2 = 8   |
| -      | Subtraction    | 6-2 = 4   |
| *      | Multiplication | 6*2 = 12  |
| /      | Division       | 6/2 = 3   |
| **     | Power of       | 6**2 = 36 |
| %      | Modulus        | 6%2 = 0 , 6%4 = 2 |
| //     | Floor division | 6%2 = 3 , 6%4 = 1 |

See full list of mathematical operators here: https://www.w3schools.com/python/python_operators.asp

---

If we want to save the result of a mathematical operation we need to store it in a variable. Either in a new variable or by overwriting an existing one.

Only variables left of the assignment operator `=` are modified. If there is no `=` then no variable is modified from a mathematical operator.

In [None]:
# Create a new variable that is x multiplied by my_long_variable_name
y = x * my_long_variable_name

# Create a new variable that is the sum of x and my_long_variable_name
z = x + my_long_variable_name

If we want to print multiple variables in the same cell we need to use `print()`

In [None]:
# Print the variables one at the time
print(x)
print(my_long_variable_name)
print(y)
print(z)

In [None]:
# Print all variables at on the same line
print(x, my_long_variable_name, y, z)

In [None]:
# You can also print the results of an operation
print(12 * 89)
print(y - 20)

In [None]:
# You can combine printing and output
print(y - 20)
5 ** 3

Since incrementing a variable with a value, such as in `x = x + 1`, is such a common action, there is a short hand for it that is `x += 1`

See what other operators you can use like this here: https://www.w3schools.com/python/python_operators.asp

In [None]:
count_matches = 5
count_matches += 2 # This is identical to: "count_matches = count_matches + 2"
print(count_matches)

**Ex. 2a** (do ex 2a, 2b and 2c independently ~ 5 min)

In [None]:
# Create two variables ex2_x and ex2_y. Set ex2_x to 3 and ex2_y to 5.

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex2_x == 3 and ex2_y == 5

**Ex. 2b**

In [None]:
# Multiply ex2_x with ex2_y and save the result in a new variable ex2_z

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex2_z == 15

**Ex. 2c**

In [None]:
# Update the variable ex2_z by subtracting ex2_x from it
# (Hint: re-run the cells above if/when needed)

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex2_z == 12

## S2.2 Two basic types of numeric variables

There are two types of numeric basic data types:

| Class name | Full name      | Name used       | Usage                        |
|:---        |:---            |:---             | :---                         |
| int        | Integer        | "int"/"integer" | Number without decimal point |
| float      | Floating point | "float"         | Number with decimal point    |


`int` is more memory efficient but cannot store decimal points. 
Python will pick `int` for you 
unless your variable must be a `float` to store your data without information loss.

Read more about `int` and `float` here: https://www.w3schools.com/python/python_numbers.asp

---

You can test which type your numeric variable using `type()`

In [None]:
# Numeric variables assigned a number WITHOUT decimal point are created as an int
x = 3
print(type(x))
type(x)

In [None]:
# Numeric variables assigned a number WITH decimal points are created as a float
pi = 3.141592
print(type(pi))

In [None]:
# Python automatically assignes the appropriate type
diameter = 10
print(diameter, type(diameter))

#The result of division is always a float
radius = diameter / 2
print(radius, type(radius))

In [None]:
#The result of an operation with float and an int is always a float
radius = 5
circumference = pi * (radius)
print(radius, type(radius))
print(circumference, type(circumference))

In [None]:
# you can force a float to be an int - it rounds down the closest int
# NOTE: This leads to information loss about the decimal points
y = int(7.25)
print(y, type(y))

In [None]:
# This is not rounding, it just takes the integer part of the float
# and drops the decimal - rounding exists but its not int()
z = int(7.99999)
print(y, type(y))

In [None]:
# Python changes the type if needed
salary = 17
print(salary, type(salary))
#increase for inflation
salary = salary * 1.05
print(salary, type(salary))

**Ex. 3a** (do ex 3a, 3b and 3c independently ~ 5 min)

_Hint:_ Mathematical operators: https://www.w3schools.com/python/python_operators.asp

In [None]:
# Create a variable ex3_x that is a 13 to the power of 12

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex3_x == 23298085122481

**Ex. 3b**

In [None]:
# Create a variable ex3_y that is 
# the remainder when dividing ex3_x with 17

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex3_y == 1

**Ex. 3c**

In [None]:
# Create a variable ex3_z that is a float with the value three
# (The solution has not been mentioned explicitly)

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert hash(ex3_z) == 3 and type(ex3_z) is float

## S2.3 Text variables - basic type: string



There is only one basic data type for text, and it is called "string".

| Class name | Full name      | Name used       | Usage                        |
|:---        |:---            |:---             | :---                         |
| str        | String         | "string"        | Text                         |

The text in a string could be anything from a single letter or word, to a full-length text like an essay. 

---

**Define a string variable**

In [None]:
# Assign the text Hello World! to both variable a and b

# We can use either " or ' to tell where the text starts and ends so python does not confuse it for code
a = "Hello world!"
b = 'Hello world!'

print(a, type(a))
print(b, type(b))

We must use either `""` or `''` for each string, we cannot mix. It only rarely matters which one we use.

In [None]:
# We can use either "" when the text includes one or several '
a = "Strings are Python's way to store text"

# We can use either '' when the text includes one or several "
b = 'Python is the "bestest" programming language'

print(a, type(a))
print(b, type(b))

**Simple string operations:**

Some math operators work on strings as well

In [None]:
a = 'hello'
b = 'world'

# Addition and multiplication work on strings (but not subtraction and division)
c = a + ' ' + b + '!'
d = a * 3

print(c, type(c))
print(d, type(d))

In [None]:
# Now when we know strings, we can add a string 
# to the print function to 
# keep track of what we are printing
print('Variable c:', c, type(c))
print('Variable d:', d, type(d))
c
d

**Ex. 4a** (do ex 4a, 4b independently ~ 10 min)

In [None]:
# Create a variable ex4_x that is a string with the word World
# and a variable ex4_y that is a string with the word Bank

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex4_x == 'World' and ex4_y == 'Bank'

**Ex. 4b**

In [None]:
# Create a variable ex4_z that use ex4_x and ex4_y 
# to create the word World Bank

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex4_z == 'World Bank'

## S2.4 Methods

Atoms, containers and objects can all hold different types of information. 
But Python wouldn't be that useful if we cannot do anything to the information they hold.

Each type of atom, container and object comes with actions design for that type.
Such **actions specific to a type are called _methods_**. 

_All variables consist of three things:_

* The name of the variable so it can be uniquely identified and accessed
* The "*type*" - what type of atom, container or object 
    * What data that type can store
    * What methods (if any) that this type comes with
* The information/data that the variable actually stores

A method is applied to the data in a variable like this `x.method()`. 
A method can be something simple like making a `str` upper case, 
or something extremely advanced as running a machine learning module. 

---

**String methods:**

Strings is the only Python atom with methods. Upper/lower case, replace letters, remove excessive spaces etc.

You can read about all string methods here: https://www.w3schools.com/python/python_ref_string.asp

In [None]:
# Define a new string
a = 'Hello world!'
print(a, type(a))

In [None]:
# Print the result of the method directly
print(a.upper())

In [None]:
# Store the results of upper() in new variable and then print
a_upper = a.upper()
print(a_upper,type(a_upper))

In [None]:
# Lower case
a.lower()
print(a,type(a)) # Why is there still a capital "H" in the output?

In [None]:
# Relace letters in a string
a_all_i = a.replace('o', 'i')
a_one_i = a.replace('o', 'i', 1)

print('a_all_i:',a_all_i,type(a_all_i))
print('a_one_i:',a_one_i,type(a_one_i))

---

Methods are different from operators (`+`, `-`) that we used with numbers. 
In addition to methods, each type can have support for these operators.

`int` and `float` does not have any methods, they only have support for operators.
`str` has method and support for some operators.

When you are concatenating strings (combining) you can use the `+` operator.


In [None]:
name = "Frodo Baggins"
age = 51

In [None]:
str_concat = "His name is " + name + " and his age is " + str(age) + "."
print(str_concat)

---

We can also use the `.format()` method to achieve this.

In this example we have the string `"His name is {} and his age is {}."`
and we are using the `.format()` method to populate the two `{}` placeholders.

This method is designed to identify an `int` and how to turn the `int` into
a `str` without you having to think about it.

In [None]:
str_format = "His name is {} and his age is {}.".format(name,age)
print(str_format)

---

The `.format()` works for shorter strings, but for longer strings and 
paragraphs of text where only a few words should be dynamically populated 
the better option is an `f-string`. 

We won't cover what an `f-string` is, 
but all you need to know is how to recognize it.

In [None]:
str_fstr = f"His name is {name} and his age is {age}."
print(str_fstr)

**Important error message: AttributeError**

Whenever you see an error where it says "has no attribute", as in `AttributeError: 'int' object has no attribute 'upper'`, then it means that the type `int` does not have a method or attribute called `upper`. 

Attribute is something similar to a method but attributes only return some meta data about a variable, and is not able to change the data in the variable.

If you get an AttributeError, test if you have misspelled the method/attribute or if the variable is of a different type than you expected. Below we get this error as we are using a `str` method on an `int` type variable.

In [None]:
x = 4
x = x.upper()

**Ex. 5a** (do ex 5a and 5b independently ~ 10 min)

In [None]:
# Use a string method on the variable p already provided,
# to create a variable ex5_x with the string "PYTHON"

p = 'Python'

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex5_x == 'PYTHON'

**Ex. 5b**

_Hint_: https://www.w3schools.com/python/python_ref_string.asp

In [None]:
# Use the variable ex5_x from ex 5a.
# Use a method to create the string "Python"
# (The solution has not been mentioned explicitly)

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex5_y == 'Python'

## S2.5 True/False variables - basic type: boolean

| Class name | Full name      | Name used       | Usage                        |
|:---        |:---            |:---             | :---                         |
| bool       | Boolean        | "boolean"       | Either true or false         |

Boolean is the atom in Python. Session 2 will discuss common usages of them. This session only covers how to identify them as you will see them often in Python.

### Examples of usage of booleans

* **Method responses**: So far we have only used string methods that have returned new strings. `upper()`, `lower()`, `replace()` etc. Many methods returns booleans instead, such as `isnumeric()`, `islower()` etc.
* **If-conditions**: They are excellent to use to control if-else conditions.

---



In [None]:
# Generate a boolean
a = True
b = False

# Print the variables
print('Variable a:', a, type(a))
print('Variable b:', b, type(b))

In [None]:
# Get a boolean from a method

# Create a variable that is a string of a number
c = "42"
d = c.isnumeric()

# Print the variables
print('Variable c:', c, type(c))
print('Variable d:', d, type(d))

## S2.6 The None type - a variable exists but it contains nothing

| Class name | Full name      | Name used       | Usage                             |
|:---        |:---            |:---             | :---                              |
| none       | None           | "none"          | An explicit way of saying nothing |

Sometimes we want to have a variable even if that variable is empty. 
This will prevent `NameError` due to a variable not existing, even when there is no information to store in that variable.

In [None]:
name = "Frodo Baggins"
age = 51

print(f"{name} (age {age}) is employed by {employer}")

In [None]:
# Frodo is unemployed
employer = None
print("employer",employer,type(employer))
print(f"{name} (age {age}) is employed by {employer}")

# Fordoe gets a job at the World Bank
employer = "World Bank"
print("employer",employer,type(employer))
print(f"{name} (age {age}) is employed by {employer}")

## S2.7 Basic data types summary

* The only way to store data in Python
* Stored in a variable with a name and type
* Which operations (`+`, `-`, etc..) or methods (`.upper()`) you can use depends on the type

**Important errors**

| Error name         | Likely reason for the error | 
|:---                |:---                     |
| **NameError**      | You have a typo when referencing a variable or you try to reference a variable before it is created | 
| **AttributeError** | You have used a method or an attribute on a type where that method or attributed does not exist | 



## S3.0 Functions and summary of operators and methods

### S3.1 Functions

Python has some built in functions. Functions are like methods an action. But when methods is added to a variable, like `x.method()`, you use a function directly in your code. 

You have already seen the functions `print()` and `type()`. 
They accept any type of variable. Some functions do not work on every type.

In [None]:
#Define two strings and one int
str_a = "Frodo"
str_b = "Gandalf"
int_a = 51

# Start by printing them
print('str_a:',str_a,type(str_a))
print('str_b:',str_b,type(str_b))
print('int_a:',int_a,type(int_a))

The function `len()` is used to get the length of a variable.
For a string that is the number of characters.

In [None]:
print('Number of charcters in',str_a,":",len(str_a))
print('Number of charcters in',str_b,":",len(str_b))

What do you think that the length of an `int` is?

In [None]:
print('The lengt of the int',int_a,"is:",len(int_a))

### S3.2 Summary of the types of "actions" you can take on Python data

| Action name | Examples      | Description    | Documentation |
|:---         |:---           |:---            |:---           |
| Operator    | `+`,`-` etc.  | Only used for basic actions. Most common to use with numeric types but works in some cases on other types as well. | https://www.w3schools.com/python/python_operators.asp |
| Method      | `x.method()`  | Always specific to a type (does not imply unique to a type). Used to interact with, modify or analyze the data in the variable. | Read the documentation for each  type of atom, container or object.       |
| Function    | `function(x)` | Some built in and it is common to make your own (tomorrow's session) | Built-in functions: https://www.w3schools.com/python/python_ref_functions.asp |

### S3.3 Where to find documentation for each type or object?

For example, what methods does a type have and what do they do?


In [None]:
# Basic information of the type
str?

In [None]:
# If you know the name of method you can use
str.upper?

In [None]:
# A not so user friendly but complete way of listing all attributes and methods that a type has
print(dir(str))

In [None]:
# This is str? and str.method? for all methods in dir(str) into a single output
help(str)

**But in reality**, most people use this to read the documentation: https://www.google.com/search?q=google+python+str+methods
        
Python is such a widely used language that there is always someone who has written a great guide for what you need to know, and google helps you find it.

# S4.0 - Container types

So far we have only covered the atoms of Python.
We have not yet introduced how you 
combine the atoms into more useful molecules.

The basic data types `int`, `float`, `str` and `bool` can be combined in **the basic container types**.

| Class name | Full name  | Access                      | Occurrence  | Remarks |
|:---        |:---        | :---                        | :---        | :---
| list       | List       | Access items by order       | Common      | Since we access items by order, the order you add items to a list is important |
| dict       | Dictionary | Access items by key         | Common      | Since we access items by key names, the order is not important |
| tuple      | Tuple      | Access items by order       | Less common | Very similar to a list, but when created it cannot be modified |
| set        | Set        | Test if item already in set | Rare        | A container that cannot hold duplicates |

Containers can hold basic data types (atoms) variables
as well as other containers variables. 
You can mix data types and container types if need.
Complex variables in Python are created by 
nesting many layers of container variables.

`list`s and `dict`s - 
we will cover lists and dictionaries properly 
as you will create and use them a lot. 

`tuples`s and `set`s - 
Tuples are often returned from functions and methods
so we will cover how to use them. 
We will only briefly cover sets.


## S4.1 Container types - Lists

| Class name | Full name  | Access                      | Occurrence  | Remarks |
|:---        |:---        | :---                        | :---        | :---
| list       | List       | Access items by order       | Common      | Since we access items by order, the order you add items to a list is important |

We can add variables to a list 
at the time of creating the list 
or we can add variables later. 

We access an item in the list by its order.
For example, the 3rd item, 7th item etc. 

However, items are accessed by index, 
and in computer science index starts on 0 and not 1. 
So the item with index 1 is actually the second item in the list. 

---

**Create a list:**

In [None]:
# Create a list of ints
list_int = [0,1,2,3,4,5,6,7,8,9]
print(list_int)

In [None]:
# Create a list of strings
list_str = ['a','b','c']
print(list_str)

In [None]:
# Create a mixed list
list_mix = [42,'Arthur',False]
print(list_mix)

In [None]:
# test the type of a list
print(type(list_mix))

**Access an item in a list**:

In [None]:
# Print list and print each item in the list
print('List list_mix:', list_mix, type(list_mix))
print('First item (index 0):', list_mix[0], type(list_mix[0]))
print('Second item (index 1):', list_mix[1], type(list_mix[1]))
print('Third item (index 2):', list_mix[2], type(list_mix[2]))

In [None]:
# Access item in list and store in variable
name = list_mix[1]
print('Variable name:', name, type(name))

In [None]:
# Accessing items using the index does not modify the list
print('List list_mix:', list_mix, type(list_mix))

**Access multiple items in a list:**

In [None]:
# Re-create list of ints
list_int = [0,1,2,3,4,5,6,7,8,9]

In [None]:
# Get all items between the item with index 0 
# up until but not including the item with index 3
# 0 ≤ index < 3
print(list_int[0:3])

In [None]:
# Index 0 is assumed if the fist number is omitted
# So 0:3 is the same as :3
print(list_int[:3])

In [None]:
# 5 ≤ index < 7
print(list_int[5:7])

In [None]:
# 8 ≤ index < infinity
# All remaining items are included if the second number is omitted
print(list_int[8:])

In [None]:
# (number of items - 3) ≤ index < infinity
print(list_int[-3:])

In [None]:
# (number of items - 7) ≤ index < (number of items - 2)
print(list_int[-7:-2])
# 3 ≤ index < 8
print(list_int[3:8])

**Important error message: IndexError**
    
Whenever you see an error where it says "index out of range", as in `IndexError: list index out of range`, then it means that you have tried to access an item in the list, using an index that is not used in the list.

In [None]:
# IndexError: list index out of range
print(list_int)
print(list_int[10])

**Ex. 6a** (do ex 6a and 6b independently ~ 5 min)

In [None]:
# From the list digits, in one line of code,
# create the variable ex7_x with the list [0,1,2,3,4]

digits = [0,1,2,3,4,5,6,7,8,9] 

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex7_x == [0,1,2,3,4]

**Ex. 6b**

In [None]:
# From the list digits, in one line of code,
# create the variable ex7_y with the list [5,6,7,8]

digits = [0,1,2,3,4,5,6,7,8,9] 

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex7_y == [5,6,7,8]

**Edit a list:**

So far, every time we have modified a variable we have used a `=`. For example `x = x + 1` or `name = list_mix[1]`. 

Lists have some _in-place_ operator methods, meaning methods that change modify the item itself.

You find a list of more list methods here: https://www.w3schools.com/python/python_ref_list.asp

In [None]:
# Create a list of strs
pets = ['cat','dog']
print('Variable pets:', pets, type(pets))

# Add one item to the list - .append() is an in-place operator
pets.append('gold fish')
print('Variable pets:', pets, type(pets))

# Note that we did not do: pets = pets.append('gold fish')

In [None]:
# Add another item to the list using in-place .append() and the "=" assign operator
pets_append_return = pets.append('butterfly')
print('Variable pets:', pets, type(pets))
print('Variable pets_append_return:', pets_append_return, type(pets_append_return))

In [None]:
# Re-create the list of pets
pets = ['cat', 'dog', 'gold fish', 'butterfly']

# Print item with index 3 in original list
print('Print pet at index 3:', pets[3], type(pets[3]))

# Add item at index 2
pets.insert(2,'parrot')

# Print item with index 3 again
print('Print pet at index 3:', pets[3], type(pets[3]))

#Print all pets
print('Variable pets:', pets, type(pets))

In [None]:
# Modify item with index 1 
pets[1] = 'wolf'
print('Variable pets:', pets, type(pets))

In [None]:
# Erase item in list by index. Item returned
pet_pop = pets.pop(3)
# Erase item in list by value. None returned
pet_remove = pets.remove("cat")

# Print results
print('Variable pet_pop:', pet_pop, type(pet_pop))
print('Variable pet_remove:', pet_remove, type(pet_remove))
print('Variable pets:', pets, type(pets))

**Work with lists:**

In [None]:
# Create two lists.
odds = [1,3,5,7,9]
evens = [0,2,4,6,8]

# Combine and sort them
all_nums = odds + evens
print('Variable all_nums:', all_nums, type(all_nums))

# Sort the list - note that .sort() is an in-place operator
all_nums.sort()
print('Variable all_nums:', all_nums, type(all_nums))

In [None]:
# Create a list of lists
l1 = ['a','b','c']
l2 = ['d','e','f']
l3 = ['g','h','i']

# Create the list of list
nested_list = [l1,l2,l3]
print('Variable nested_list:', nested_list, type(nested_list))

In [None]:
# Access the item "f" in nested_list - multiple lines
nested_lvl1 = nested_list[1]
print('Variable nested_lvl1:', nested_lvl1, type(nested_lvl1))
nested_f    = nested_lvl1[2]
print('Variable nested_f:', nested_f, type(nested_f))

# Access the item "f" in nested_list - single line
f = nested_list[1][2]
print('Variable f:', f, type(f))

In [None]:
# Start with an empty list
sample_means = []

# Add items to the list
sample_means.append(23.45)
sample_means.append(45.1)
sample_means.append(28.62)

print('Variable sample_means:', sample_means, type(sample_means))

In [None]:
# Create a list by repeating another list
list_a = ['a'] 
list_a5 = list_a * 5
list_abc3 = ['a','b','c'] * 3

print('Variable list_a:', list_a, type(list_a))
print('Variable list_a5:', list_a5, type(list_a5))
print('Variable list_abc3:', list_abc3, type(list_abc3))

**Get info about a list:**

We can use the same function `len()` we used for strings

In [None]:
print('Number of items in list_a:', len(list_a))
print('Number of items in list_a5:', len(list_a5))

In [None]:
# We can store the length of a list in a variable if needed
len_list_abc3 = len(list_abc3)
print('Number of items in list_abc3:', len_list_abc3, type(len_list_abc3))

Test if an item is or isn't in a list

In [None]:
list_abc = ['a','b','c']

a_in_list_abc = 'a' in list_abc
d_not_in_list_abc = 'd' not in list_abc

print('Variable list_abc:', list_abc, type(list_abc))
print('Variable a_in_list_abc:', a_in_list_abc, type(a_in_list_abc))
print('Variable d_not_in_list_abc:', d_not_in_list_abc, type(d_not_in_list_abc))

**Ex. 7a** (do ex 7a, 7b, 7c, 7d and 7e independently ~ 8 min, 15 if inlcuding advanced)

In [None]:
# From the list digits, in one line of code,
# create the variable ex8_x with the int 6

digits = [[0,1,2],[3,4,5],6,[7,8,9]]

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex8_x == 6

**Ex. 7b**

In [None]:
# From the list digits, in one line of code,
# create the variable ex8_y with the int 4 

digits = [[0,1,2],[3,4,5],6,[7,8,9]]

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex8_y == 4

**Ex. 7c**

In [None]:
# Using only the lists in the variables a, b and c and list methods
# create a list [1,2,3] and store it in the variable ex8_k.
# You may not define any new ints or lists

a = 2
b = [1,4,3]
c = 1

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex8_k == [1,2,3]

**Ex. 7d** (advanced)

In [None]:
# Using only the lists in the variables a, b and c and list methods
# create a list [1,2,3,4] and store it in the variable ex8_z. 
# You may not define any new ints or lists

a = 4
b = [2,3]
c = [1]

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex8_z == [1,2,3,4]

**Ex. 7e** (advanced)

In [None]:
# Using only the lists in the variables a, b and c and list methods
# create a list [1,2,3] and store it in the variable ex8_i.
# You may not define any new ints or lists

a = 3
b = [1,4,2]
c = 1

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex8_i == [1,2,3]

## S4.2 Container types - Dictionaries

| Class name | Full name  | Access                      | Occurrence  | Remarks |
|:---        |:---        | :---                        | :---        | :---
| dict       | Dictionary | Access items by key         | Common      | Since we access items by key names, the order is not important |

Each item in a dictionary consists of two things. The item itself and a key used to refer to it. 

The item can be of any type (anything from atoms to advanced molecules) but the key is always a string.

---

**Create dictionaries and access items:**

In [None]:
# Create a dictionary
x = {'a':'alpha','b':3,'c':True,'d':[1,2,3]}
print('Variable x:', x, type(x))

In [None]:
# Access item in a dict using the key
print("Variable x['a']:", x['a'], type(x['a']))
print("Variable x['b']:", x['b'], type(x['b']))
print("Variable x['c']:", x['c'], type(x['c']))
print("Variable x['d']:", x['d'], type(x['d']))

In [None]:
# Lets say we are a bank keeping track of info about accounts

# Start with an empty dict
accounta = {}
accountb = {}

# Set up account A details
accounta['owner'] = 'Jerry Ehman'
accounta['id'] = '6EQUJ5'

# Set up account B details in different order
accountb['id'] = 'GTCTAT'
accountb['owner'] = 'Rosalind Franklin'

print('Variable accounta:', accounta, type(accounta))
print('Variable accountb:', accountb, type(accountb))

In [None]:
# The same value can be accessed with the same key regardless of the order the values were added
print('Owner account A:', accounta['owner'], type(accounta['owner']))
print('Owner account B:', accountb['owner'], type(accountb['owner']))

In [None]:
# Deposit initial amount on account A
accounta['balance'] = 1420
print('Balance account A:', accounta['balance'], type(accounta['balance']))

**Important error message: KeyError**
    
Whenever you see an error on the format
`KeyError: 'balance'`, 
then it means that you have tried to access an item in the list, using a key that is not used in the dictionary.

In [None]:
print('Balance account B:', accountb['balance'], type(accountb['balance']))

In [None]:
# When applicable, use get() method to set a default value if key does not exist
print('Balance account A:', accounta.get('balance',0))
print('Balance account B:', accountb.get('balance',0))

In [None]:
# When using .get() on a key that does not exist without default value
print('Balance account B:', accountb.get('color'), type(accountb.get('color')))

**Ex. 8a** (do ex 8a and 8b independently ~ 10 min)

In [None]:
# Using only the already defined variables,
# (you may not type any keys manually)
# modify the empty dict ex8_z into
# {'pet1':'Dog','pet2':'Cat'}
# You may not overwrite ex8_z

p1 = 'pet1'
p2 = 'pet2'
Arthur = 'Cat'
b = "Dog"
c = Arthur
ex8_z = {}

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex8_z == {'pet1':'Dog','pet2':'Cat'}

**Ex. 8b**

In [None]:
# Using only the complex_dict and 
# accessing items using only keys and indexes,
# create the following variables zero as the int 0,
# d as the string d, minus_three as the int -3
# and symbol_list as the list ['%','?','~']
# Try create each of these variables in a single line of code


complex_dict = {
    'alpha': [
        'a','b','c','d'
    ],
    'numbers': [
        [1,2,3],
        0,
        [-1,-2,-3]
    ],
    'symbols' : {
        'percent' : '%',
        'question' : '?',
        'tilde' : '~'
    }
}

zero = ### ADD YOUR CODE HERE
d = ### ADD YOUR CODE HERE
minus_three = ### ADD YOUR CODE HERE
symbol_list = ### ADD YOUR CODE HERE

# === Do not modify code below ===
assert zero==0 and d=='d' and minus_three==-3 and symbol_list==['%','?','~']

**Get the keys and/or the items of a dictionary**

Containers are great at holding variables with data in a structured way. 
It is often the case we want to access one item at the time in the list.

We will cover loops in next sessions, but already now we will cover
three methods important when looping over the key-value pair of a dict.


In [None]:
# Define a dictionary with countries and capitals
capitals = {
    'China':'Beijing',
    'India':'New Delhi',
    'Ohio':'Columbis',
    'Peru':'Lima',
    'Sweden':'Stockholm',
}

# We can get all keys or all lists using the methods .keys() and .values()
print("capitals.keys():", capitals.keys(), type(capitals.keys()))
print("capitals.values():", capitals.values(), type(capitals.values()))

In [None]:
# Get all keys in a dictionary
for country in capitals.keys() :
    print(country)

In [None]:
# Get all values in a dictionary
for capital in capitals.values() :
    print(capital)

In [None]:
# Get both keys and values in a dictionary
for country, capital in capitals.items():
    print(f"The capital of {country} is {capital}.")


## S4.3 Container types - Tuples

| Class name | Full name | Access                | Occurrence  | Remarks |
|:---        |:---       | :---                  | :---        | :---
| tuple      | Tuple     | Access items by order | Less common | Very similar to a list, but when created it cannot be modified |

At a first glance tuples are very similar to lists. Items in tuples are also accessed using indexes.

The main difference is that tuples are immutable, which means you cannot edit them once they are created.

In data work we usually want to be able to edit our data, so you will not create them often. But it is common that methods and functions return tuples.

--- 

**Create a tuple:**

In [None]:
list_mix  = [42,'Arthur',False]
tuple_mix = (42,'Arthur',False)

print('list_mix:', list_mix, type(list_mix))
print('tuple_mix:', tuple_mix, type(tuple_mix))

**Access item in a tuple:**

In [None]:
# Items are accessed the same way as a list
print('list_mix item index 0:', list_mix[0], type(list_mix[0]))
print('tuple_mix item index 0:', tuple_mix[0], type(tuple_mix[0]))

**Modify item in a tuple:**

In [None]:
# First modify item in list
list_mix[1] = 'Marvin' 
print('list_mix:', list_mix, type(list_mix))


**Important error message: TypeError**
    
Whenever you see an error that says "does not support" as in 
`TypeError: 'tuple' object does not support item assignment`, 
then it means that you have tried to do an action on a type for which that action is not allowed.
In this example modify an item in a tuple.

In [None]:
# However we cannot do the same in 
tuple_mix[1] = 'Marvin'

In [None]:
# Use tuples to get information and store them in variables
name = tuple_mix[1]
print('Variable name:', name, type(name))

# Items copied from a tuple no longer need to be immutable
name = name.upper()
print('Variable name upper:', name, type(name))

In [None]:
# If we type cast a tuple to a list it behaves as a list
list_from_tuple = list(tuple_mix)
list_from_tuple[1] = 'Marvin'
print('Variable list_from_tuple:', list_from_tuple, type(list_from_tuple))

In [None]:
# While items in a tuple are immutable, the tuple can be overwritten
tuple_from_list = tuple(list_from_tuple)
print('Variable tuple_from_list:', tuple_from_list, type(tuple_from_list))

## S4.4 Container types - Sets

| Class name | Full name  | Access                      | Occurrence  | Remarks |
|:---        |:---        | :---                        | :---        | :---
| set        | Set        | Test if item already in set | Rare        | A container that cannot hold duplicates |

Rare but included it for completion. It's a container that cannot have duplicates. 

---

**Demonstrate a set:**

In [None]:
# Create a set that stores all skills in a team:
team_skills = set()
print('Variable team_skills:', team_skills, type(team_skills))

In [None]:
# Define skillsets for person A, B and C
personA_skill1 = 'python'
personB_skill1 = 'field-work'
personC_skill1 = 'python'
personC_skill2 = 'Excel'

# Add skillsets for person A, B and C
team_skills.add(personA_skill1)
team_skills.add(personB_skill1)
team_skills.add(personC_skill1)
team_skills.add(personC_skill2)
print('Variable team_skills:', team_skills, type(team_skills))

In [None]:
# Add a list of person D's skills to the set
personD_skills = ['python','field-work','management','accounting']
team_skills.update(personD_skills)
print('Variable team_skills:', team_skills, type(team_skills))

In [None]:
# Test if the team has a skillset
print('python' in team_skills)
print('R' in team_skills)

## S4.5 Basic container types summary

* This is how we combine basic data types (atoms)
* Stored in a variable with a name and type (just like data types and any other variable)
* How you access items is the main difference between the two most common containers; `list` and `dict`
* Containers are often nested (a container in a container)
* The types in a container can be mixed

**Important errors**

| Error name     | Likely reason for the error | 
|:---            |:---                         |
| **IndexError** | You are trying to access an item in a list using an index outside the range of indexes for that list | 
| **KeyError**   | You are trying to access an item in a list using a key name that does not exist in the dict | 
| **TypeError**  | Very generic error where you have used an operation (math operation, index assignment etc.) that is a recognized operation but not allowed for this variable's type | 

**Optional Ex. 9a**

In [None]:
# Generate a tuple with the three letters m, n and p
# in alphabetic order. Call the tuple letters.
# then access the letter m
# and store as a string in the variable M

letters = ### ADD YOUR CODE HERE
M = ### ADD YOUR CODE HERE

# === Do not modify code below ===
assert type(letters) is tuple and M=='m'