# **Built-In Data Structures**

Python also has several built-in compound types, which act as containers for other types.

| Type Name | Example                   |Description                            |
|-----------|---------------------------|---------------------------------------|
| ``list``  | ``['Lisa', 24, '2X']``             | Ordered collection                    |
| ``tuple`` | ``('Lisa', 24, '2X')``             | Immutable ordered collection          |
| ``dict``  | ``{'name':'Lisa', 'ID':24, 'Class':'2X'}`` | Unordered (key,value) mapping         |
| ``set``   | ``{'Lisa', 'Ellie', 'Jon'}``             | Unordered collection of unique values |

Note that (), [], and {} brackets are sed to set the type of structure.

Later on we will also look at Numpy arrays which is the prefered class for large numeric datasets.

## <u>Lists</u>
Lists are the basic **ordered** and **mutable** data collection type in Python.

In [None]:
a = []             # create an empty list
a = [2, 3, 5, 7]   # create a list with integers

Lists have a number of useful properties and methods available to them:

* Length of a list

In [None]:
len(a)

* Append (method) element to the end of the list

In [None]:
a.append(11)
a

* Addition (+) concatenates lists

In [None]:
a + [13, 17, 19]

* Sort (method) the elements in lists in place
    * `.sort()` method can also be used to alphabetize a list of strings

In [None]:
a = [2, 5, 1, 6, 3, 4]
a.sort()
a

In [None]:
b = ['horse','llama', 'zebra', 'guanaco']
b.sort()
b

One of the powerful features of Python's compound objects is that they can contain a mix of objects of *any* type.

In [None]:
a = [1, 'two', 3.14, [0, 3, 5], False]
a

This flexibility is a consequence of Python's dynamic type system.
Creating such a mixed sequence in a statically-typed language like C can be much more of a headache!
We see that lists can even contain other lists as elements.
Such type flexibility is an essential piece of what makes Python code relatively quick and easy to write.


### 1. Exercises: List creation

**Ex 1.1**: Create and print a list with the integer numbers 2015 to 2022.

In [None]:
# Create list

**Ex 1.2**: Create a list with the names of countries of the United Kingdom

In [None]:
# Create list

**Ex 1.3**:
* Create two separate lists, each with dates formatted as `[year, month, day]`
    * e.g., `[1789,7,14]` could be the first list
* Create and print a list that contains the two previous lists (a list of lists!)

In [None]:
# Create two separate lists

In [None]:
# Create a list of the two above lists

## <u>List indexing and slicing</u>
Python provides access to elements in compound types through *indexing* for single elements, and *slicing* for multiple elements.

In [None]:
a = [2, 3, 5, 7, 11]

Python uses *zero-based* indexing, so we can access the first and second element in using the following syntax:

In [None]:
a[0]

In [None]:
a[1]

Elements at the end of the list can be accessed with negative numbers, starting from -1:

In [None]:
a[-1]

In [None]:
a[-2]

You can visualize this indexing scheme this way:

![List Indexing Figure](../figures/list-indexing.png)

Here values in the list are represented by large numbers in the squares; list indices are represented by small numbers above and below.
In this case, ``L[2]`` returns ``5``, because that is the next value at index ``2``.

While **indexing** is a means of fetching a single value from the list, **slicing** is a means of accessing multiple values - a sub-list.
To slice a part of a list, we can use a **colon** (:) to indicate the start point (inclusive) and end point (non-inclusive) of the sub-array. You can think of the colon as shorthand for **everything in between**.


For example, to get the 2nd, 3rd and 4th elements of the list `a`, or `a[1]`, `a[2]` and `a[3]`, we can write:

In [None]:
a[1:4]

If we leave out the number on the left side of the colon, it defaults to grabbing everything left of the 5th element:

In [None]:
a[:4]

Similarly, if we leave out the last index, it defaults to the length of the list.
Thus, the last 2 elements can be accessed as follows:

In [None]:
a[3:]

Finally, it is possible to specify a third integer that represents the step size; for example, to select every second element of the list, we can write:

In [None]:
a[0:len(a):2]
# or a[0:5:2]

This can also be shortened to:

In [None]:
a[::2]

A particularly useful version of this is to specify a negative step, which will reverse the array:

In [None]:
a[::-1]

Both indexing and slicing can be used to redefine elements just as it was used to access them.
The syntax is as you would expect:

In [None]:
a[0] = 100
print(a)

In [None]:
a[1:3] = [55, 56]
print(a)

### 2. Exercises: List indexing

**Ex 2.1**: Using the list with values 2015 to 2022,

```python
years = [2015,2016,2017,2018,2019,2020,2021,2022]
```        
       
* Print the first two values in the list.
* Print the even years in the list.
* Print the two last values in the list.

In [None]:
# First two values

In [None]:
# Even years

In [None]:
# Last two values

## <u>Methods and Attributes of Class List</u> 

Every list variable (i.e. every instance of Class List) has predefined attributes and methods. 
Remember that these can be called by using the dot notation: 
type the name of a list variable with a dot afterwards and press Tab to open a menu with the possibilites. 

`mylist.`(press Tab)

In the exercises below we will use methods `.insert()`, `.append()` and `.remove()`.

You can also list all the methods and attributes of an object by using buildtin function `dir()` 

In [None]:
dir(a)

### 3. Exercises: Methods and Attributes of Class List

**Ex 3.1**: 
* Copy and paste the following list into the cell below:

```python
data = [10,12,13,18,'error',22]
```

* Append the number 24 to the list and print.
* Insert the number 15 to the list such that the elements are in ascending order and print.
    * Hint: don't forget Python indexing starts at 0!
* Remove the `'error'` string in the list and print.

In [None]:
# Copy and paste data

In [None]:
# Append 24 to the list

In [None]:
# Insert 15 to the list

In [None]:
# Remove 'error' from the list

**Ex 3.2**:
* Copy and paste the following list into the cell below:
```python
countries = ['Hungary','Motenegro','Turkey','Finland','Lithuania']
```        
* Alphabetize the list and print.
* Reverse the order of the list and print.

In [None]:
# Copy and paste data

In [None]:
# Alphabetize the list

In [None]:
# Reverse the order of the list (hint: there are multiple ways to do this)

## <u>Working with lists: List comprehensions</u>

**List comprehensions** allow us to perform operations on every **member** and save the modified members in a list. If used judiciously it can result in a concise and highly readable statement. 

Syntax:

```python
new_list = [expression for member in iterable]
```

The equivalent `for` loop would be:

```python
new_list = [] # initialize new_list as empty list
for member in iterable: # loop through every member in iterable
    new_list.append(member) # on each iteration, append each member to new_list
```

**expression** - a method, operation, or any other valid expression that returns a value.

**member** - the object or value in the list or iterable.

**iterable** - a list, set, tuple, generator, or any other object that 
can return its elements one at a time.

In [None]:
dates = ['2001-01-31','2002-02-28','2003-03-31','2004-04-30']
newdates = [date+' 00:00' for date in dates]
newdates

In the example above, can you identify the expression, member, and iterable?

Here is another example using string method split() and taking the second element using index [1]. Lastly, `int()` converts the string for months into an integer. 

In [None]:
# first lets test how .split() and int() work
print( dates[0] )
print( dates[0].split('-') )
print( dates[0].split('-')[1] )
print( int(dates[0].split('-')[1]) )


In [None]:
# Now we will build the list comprehension
months = [int(dates.split('-')[1]) for dates in dates]
months

## <u>Conditional list comprehension</u>

Since python 3.8 there are now **conditional** list comprehensions:

```python
new_list = [expression for member in iterable (if conditional)]
```

The equivalent `for` loop would be:

```python
new_list = [] # initialize new_list as empty list
for member in iterable: # loop through every member in iterable
    if conditional: # for each member, check if conditional is true
        new_list.append(member) # if conditional is true for member, append that member to new_list
```

In the following example:

* `dataID` is a list containing the unique identifiers for each row of data in some dataset.

* The identifiers are based on the country (SP = Spain, US = United States, FR = France, MX = Mexico) and year.

In [None]:
dataID = ['SP2012','US2014','FR2016','SP2013','FR2013','MX2019']

We would like to extract the complete IDs, but only for Spain's data.

In [None]:
dataID_SP = [ID for ID in dataID if 'SP' in ID]
print(dataID_SP)

In the following example:

* `T_F` is a list containing temperature data, in Fahrenheit.

In [None]:
T_F = [81,86,76,'error',70,93,'error']

We would like to convert all data points that are not errors into Celsius.

In [None]:
T_C = [(T-32)*(5/9) for T in T_F if isinstance(T,(int))]
# or T_C = [(T-32)*(5/9) for T in T_F if (T!='error')]
T_C

What if we wanted to keep the errors?
* The `if-else` must be a part of the expression. 
* Thus, we must specify the conditional **before** the `for member in iterable` statement. 

In [None]:
T_C = [(T-32)*(5/9) if isinstance(T,(int)) else T for T in T_F]
T_C

### 4. Exercises: List comprehensions

#### Ex 4.1: Unit Conversion
* The following list contains temperature data in Celsius: 
        T_C = [15,7,11,20,13,4]
* Copy and paste the line above into the cell below and convert the data in `T_C` into Kelvin.
    * Note: $T_{Kelvin} = T_{Celsius} + 273.15$

In [None]:
T_C = [15,7,11,20,13,4]
T_K = [temp+273.15 for temp in T_C]
T_K

#### Ex 4.2: Data Selection
* Using the variable `dataID` below, extract all the data IDs for the year 2013.
```python
dataID = ['SP2012','US2014','FR2016','SP2013','FR2013','MX2019']
```

#### Ex 4.3: Data Cleaning
* Using the variable `T_C` from Ex 4.1 (copied here for your convenience), extract all values above 10 degrees.
```python
T_C = [15,7,11,20,13,4]
```

## <u>Tuples</u>
Tuples are in many ways similar to lists, but they are defined with parentheses.

The main distinguishing feature of tuples is that they are **immutable**:
* Once they are created, their size and contents cannot be changed; nor can you change the order of the elements in them.

Tuples are often used in a Python program; e.g. in functions that have multiple return values.

In [None]:
t = (1, 2, 3)

They can also be defined without any brackets at all:

In [None]:
t = 1, 2, 3
print(t)

Like the lists discussed before, tuples have a length, and individual elements can be extracted using square-bracket indexing. But unlike lists, once created it is not possible to modify them.

In [None]:
len(t)

In [None]:
t[0]

In [None]:
t[0] = 4

## <u>Dictionaries</u>
Dictionaries map **keys** to **values**. They form the basis of much of Python's internal implementation. You will also see them used to specify settings in many packages, such as the plotting package Matplotlib. 

Dictionaries are used much like, well, dictionaries!
* A dictionary maps a word (**key**) to its definition (**value**).

Another good analogy for a Python dictionary is a phonebook:
* A phonebook maps a person (**key**) to a phone number (**value**).

Much like a dictionary or phonebook, dictionaries are useful for when you have a datapoints that each have unique values that you might want to look up.

Dictionaries can be created via a comma-separated list of ``key:value`` pairs within **{curly braces}**, or by using the `dict()` function.

In [None]:
numbers = {'one':1, 'two':2, 'three':3}
# or
numbers = dict(one=1, two=2, three=2)
numbers

You access items in a dictionary using similar syntax as lists and tuples (with brackets [ ]). However, instead of using a numbered index, you have to use a valid **key** in the dictionary.

In [None]:
# Access a value via the key
numbers['two']

New items can be added to the dictionary using indexing as well:

In [None]:
# Set a new key:value pair
numbers['ninety'] = 90
print(numbers)

You can also storedata in a "tiered" way - an entry with **multiple keys** - with a dictionaries. To do this, you would nest an extra pair of **curly braces {}** within the dictionary entry for each additional key.

In [None]:
phonebook = {'Alice': {'mobile': {'US':1234567,
                                  'UK':1122334}},
                       'home':7654321,
            'Matthew':{'mobile':9876543,
                       'home':234567}}
phonebook

To access specific entries, you would index all the keys corresponding to the entry, in the order that they are nested:

In [None]:
print(phonebook['Alice'])
print(phonebook['Alice']['mobile'])
print(phonebook['Alice']['mobile']['US'])

Note: Prior to version 3.6, dictionaries **did not maintain any order** for the input parameters. [From Python 3.6 onwards](https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-compactdict), the standard `dict` maintains insertion order by default. 

### 5. Exercises: Dictionaries

**Ex 5.1:**

* Create a dictionary that stores the scientific names of the following trees:

        Scots Pine: Pinus sylvestris
        Norway Spruce: Picea abies
        Aleppo Pine: Pinus halepensis 

* Access the scientific name for Norway Spruce.

In [None]:
# Create the dictionary

In [None]:
# Access the entry

**Ex 5.2:**
* Create a dictionary that contains population information from administrative regions of Spain and the USA:

        Spain:
            Catalonia: 7.56E6
            Asturias: 1.022E6
            Andalusia: 8.49E6
        USA:
            California: 39.24E6
            Missouri: 6.17E6


* Access the population of all the administrative regions of Spain.
* Access the population of Missouri.

In [None]:
# Create the dictionary

In [None]:
# Access all administrative regions of Spain

In [None]:
# Access Missouri

## <u>Test if sequence is empty</u>

The recommended way of testing is a sequence (list, tuple, string or dictionary) is empty is by using its "implicit booleaness".

In [None]:
a = []
if not a:
  print("Sequence is empty:",a)

b={'one':1}
if b:
  print("Sequence is not empty:",b)

## <u>Bonus Points: Sets</u>

The 4th basic data container is the `set`, which contains unordered collections of **unique** items.
They are defined much like lists and tuples, except they use the curly brackets of dictionaries.

They do not contain duplicate entries. Which means they are significantly faster than lists!
http://stackoverflow.com/questions/2831212/python-sets-vs-lists 

In [None]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}

In [None]:
a = {1, 1, 2}

In [None]:
a

If you're familiar with the mathematics of sets, you'll be familiar with operations like the union, intersection, difference, symmetric difference, and others.
Python's sets have all of these operations built-in, via methods or operators.
For each, we'll show the two equivalent methods:

In [None]:
# union: items appearing in either
primes | odds      # with an operator
primes.union(odds) # equivalently with a method

In [None]:
# intersection: items appearing in both
primes & odds             # with an operator
primes.intersection(odds) # equivalently with a method

In [None]:
# difference: items in primes but not in odds
primes - odds           # with an operator
primes.difference(odds) # equivalently with a method

In [None]:
# symmetric difference: items appearing in only one set
primes ^ odds                     # with an operator
primes.symmetric_difference(odds) # equivalently with a method

***

# <u>Solutions</u>

### 1. Solutions: List creation

**Ex 1.1**: Create and print a list with the integer numbers 2015 to 2022.

In [None]:
years = [2015,2016,2017,2018,2019,2020,2021,2022]
years

**Ex 1.2**: Create and print a list with the names of countries of the United Kingdom.

In [None]:
countries = ['Scotland', 'Wales','England','Northern Ireland']
countries

**Ex 1.3**:
* Create two lists with a dates as year, month, day: e.g. 1789,7,14
* Create and print a list containing the two previous lists

In [None]:
bastille = [1789,7,14]
independence = [1776,7,4]
dates = [bastille, independence]

### 2. Solutions: List indexing

**Ex 2.1**: Using the list with values 2015 to 2022
* Print the first two values in the list
* Print the even years in the list
* Print the two last values in the list

In [None]:
print(years[0:2])
print(years[1::2])
print(years[-2:])

### 3. Solutions: Methods and attributes of Class List

**Ex 3.1**: 
* Copy and paste the following list into the cell below:
        data = [10,12,13,18,'error',22]
* Append the number 24 to the list and print.
* Insert the number 15 to the list such that the elements are in ascending order and print.
    * Hint: don't forget Python indexing starts at 0!
* Remove the `'error'` string in the list and print.

In [None]:
data = [10,12,13,18,'error',22]
data.append(24)
print(data)
data.insert(3,15)
print(data)
data.remove('error')
data

**Ex 3.2**:
* Copy and paste the following list into the cell below:
        data = ['Portugal','France','Spain','England','Canada']
* Alphabetize the list and print.
* Reverse the order of the list and print.

In [None]:
data = ['Portugal','France','Spain','England','Canada']
data.sort()
print(data)

# Method 1
print(data[::-1])

# Method 2
data.reverse()
print(data)

# Method 3
data.sort(reverse=True)
print(data)

### 4. Solutions: List comprehensions

#### Ex 4.1: Unit Conversion
* The following list contains temperature data in Celsius: 
        T_C = [15,7,11,20,13,4]
* Copy and paste the line above into the cell below and convert the data in `T_C` into Kelvin.
    * Note: $T_{Kelvin} = T_{Celsius} + 273.15$

In [None]:
T_C = [15,7,11,20,13,4]
T_K = [T+273.15 for T in T_C]
T_K

#### Ex 4.2: Data Selection
* Using the variable `dataID` from above, extract all the data IDs for the year 2013 (copied here for your convenience).
        dataID = ['SP2012','US2014','FR2016','SP2013','FR2013','MX2019']

In [None]:
[ID for ID in dataID if '2013' in ID]

#### Ex 4.3: Data Cleaning
* Using the variable `T_C` from Ex 4.1 (copied here for your convenience), extract all values above 10 degrees.
        T_C = [15,7,11,20,13,4]

In [None]:
[T for T in T_C if T>10]

### 5. Solutions: Dictionaries

**Ex 5.1:**

* Create a dictionary that stores the scientific names of the following trees:
        Scots Pine: Pinus sylvestris
        Norway Spruce: Picea abies
        Aleppo Pine: Pinus halepensis 
* Access the scientific name for Norway Spruce.

In [None]:
tree_dict = {'Scots Pine': 'Pinus sylvestris',
             'Norway Spruce': 'Picea abies',
             'Aleppo Pine': 'Pinus halepensis'}

In [None]:
tree_dict['Norway Spruce']

**Ex 5.2:**
* Create a dictionary that contains population information from administrative regions of Spain and the USA:
        Spain:
            Catalonia: 7.56E6
            Asturias: 1.022E6
            Andalusia: 8.49E6
        USA:
            California: 39.24E6
            Missouri: 6.17E6


* Access the population of all the administrative regions of Spain.
* Access the population of Missouri.

In [None]:
pop_dict = {'Spain':
              {'Catalonia': 7.56E6,
              'Asturias': 1.022E6,
              'Andalusia': 8.49E6},
            'USA':
              {'California': 39.24E6,
              'Missouri': 6.17E6}}

In [None]:
pop_dict['Spain']

In [None]:
pop_dict['USA']['Missouri']

## References
* *A Whirlwind Tour of Python* by Jake VanderPlas (O’Reilly). Copyright 2016 O’Reilly Media, Inc., 978-1-491-96465-1
* The [python documentation](https://docs.python.org/3/library/stdtypes.html) of standard types