# Collection Data Types

In this file, the data types that are used to store collection in Python are reviewed. It includes:
* List 
* Tuple 
* Set 
* Dictionary

## Lists

* List is a collection data type.
* List is ordered and mutable.
* It allows duplicate members.

### Creating a list
Lists are created with square brackets or the built-in list function.

In [4]:
list_1 = ["banana", "cherry", "apple"]
print(list_1)

# Or create an empty list with the list function
list_2 = list()
print(list_2)

# Lists allow different data types
list_3 = [5, True, "apple"]
print(list_3)

# Lists allow duplicates
list_4 = [0, 0, 1, 1]
print(list_4)

['banana', 'cherry', 'apple']
[]
[5, True, 'apple']
[0, 0, 1, 1]


### Access elements and Changing Items in list
You access the list items by referring to the index number. Note that the indices start at 0.
To change items, just refer to the index number and assign a new value.

In [5]:
item = list_1[0]
print(item)

# You can also use negative indexing, e.g -1 refers to the last item,
# -2 to the second last item, and so on
item = list_1[-1]
print(item)

# Lists can be altered after their creation
list_1[2] = "lemon"
print(list_1)

banana
apple
['banana', 'cherry', 'lemon']


### Useful functions and methods for list
* len(): get the number of elements in a list
* append() : adds an element to the end of the list
* insert() : adds an element at the specified position
* pop() : removes and returns the item at the given position. Default is the last item.
* remove() : removes an identified item from the list
* clear() : removes all items from the list
* reverse() : reverse order of items in the list
* sort() : sort the order of items in ascending order
* 

In [6]:
my_list = ["banana", "cherry", "apple"]

# len() : get the number of elements in a list
print("Length:", len(my_list))

# append() : adds an element to the end of the list
my_list.append("orange")

# insert() : adds an element at the specified position
my_list.insert(1, "blueberry")
print(my_list)

# pop() : removes and returns the item at the given position, default is the last item
item = my_list.pop()
print("Popped item: ", item)

# remove() : removes an item from the list
my_list.remove("cherry") # Value error if not in the list
print(my_list)

# clear() : removes all items from the list
my_list.clear()
print(my_list)

# reverse() : reverse the items
my_list = ["banana", "cherry", "apple"]
my_list.reverse()
print('Reversed: ', my_list)

# sort() : sort items in ascending order
my_list.sort()
print('Sorted: ', my_list)

# use sorted() to get a new list, and leave the original unaffected.
# sorted() works on any iterable type, not just lists
my_list = ["banana", "cherry", "apple"]
new_list = sorted(my_list)

Length: 3
['banana', 'blueberry', 'cherry', 'apple', 'orange']
Popped item:  orange
['banana', 'blueberry', 'apple']
[]
Reversed:  ['apple', 'cherry', 'banana']
Sorted:  ['apple', 'banana', 'cherry']


### Creating list with repeated elements, concatenation, and converting string to list

In [7]:
# create list with repeated elements
list_with_zeros = [0] * 5
print(list_with_zeros)

# concatenation
list_concat = list_with_zeros + my_list
print(list_concat)

# convert string to list
string_to_list = list('Hello')
print(string_to_list)

[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 'banana', 'cherry', 'apple']
['H', 'e', 'l', 'l', 'o']


### Copy a list
Be careful when copying references, as modifying the copy can also affects the original!

In [8]:
list_org = ["banana", "cherry", "apple"]

# this just copies the reference to the list, so be careful!
list_copy = list_org

# now modifying the copy also affects the original
list_copy.append(True)
print(list_copy)
print(list_org)

# use copy(), or list(x) to actually copy the list
# slicing also works: list_copy = list_org[:]
list_org = ["banana", "cherry", "apple"]

list_copy = list_org.copy()
# list_copy = list(list_org)
# list_copy = list_org[:]

# now modifying the copy does not affect the original
list_copy.append(True)
print(list_copy)
print(list_org)

['banana', 'cherry', 'apple', True]
['banana', 'cherry', 'apple', True]
['banana', 'cherry', 'apple', True]
['banana', 'cherry', 'apple']


### Iterating the list
The elements inside the list can be iterated.

In [9]:
# Iterating over a list by using a for in loop
for i in list_1:
    print(i)

banana
cherry
lemon


### Slicing the list
Access sub parts of the list wih the use of colon (:), just as with strings.

In [10]:
# a[start:stop:step], default step is 1
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
b = a[1:3] # Note that the last index is not included
print('b')
print(b)
c = a[2:] # until the end
print('c')
print(b)
d = a[:3] # from beginning
print('d')
print(d)
a[0:3] = [0] # replace sub-parts, you need an iterable here
print('a')
print(a)
e = a[::2] # start to end with every second item
print('e')
print(e)
f = a[::-1] # reverse the list with a negative step:
print('f')
print(f)
g = a[:] # copy a list with slicing
print('g')
print(g)

b
[2, 3]
c
[2, 3]
d
[1, 2, 3]
a
[0, 4, 5, 6, 7, 8, 9, 10]
e
[0, 5, 7, 9]
f
[10, 9, 8, 7, 6, 5, 4, 0]
g
[0, 4, 5, 6, 7, 8, 9, 10]


### List comprehension
A elegant and fast way to create a new list from an existing list. List comprehension consists of an expression followed by a for statement inside square brackets.

In [11]:
a = [1, 2, 3, 4, 5, 6, 7, 8]
b = [i * i for i in a] # squares each element
print(b)

[1, 4, 9, 16, 25, 36, 49, 64]


### Nested lists
Lists can contain other lists (or other container types).

In [12]:
a = [[1, 2], [3, 4]]
print(a)
print(a[0])

[[1, 2], [3, 4]]
[1, 2]


## Tuple
* Tuples are similar to lists, the main difference ist the immutability.
* Use tuple for heterogeneous (different) datatypes and list for homogeneous (similar) datatypes.
* Since tuple are immutable, iterating through tuple is slightly faster than with list.

### Creating a tuple
Tuples are created with round brackets and comma separated values. Or use the built-in tuple function.

In [13]:
tuple_1 = ("Max", 28, "New York")
tuple_2 = "Linda", 25, "Miami" # Parentheses are optional

# Special case: a tuple with only one element needs to have a comma at the end, 
# otherwise it is not recognized as tuple
tuple_3 = (25,)
print(tuple_1)
print(tuple_2)
print(tuple_3)

# Or convert an iterable (list, dict, string) with the built-in tuple function
tuple_4 = tuple([1,2,3])
print(tuple_4)

('Max', 28, 'New York')
('Linda', 25, 'Miami')
(25,)
(1, 2, 3)


### Access elements
You access the tuple items by referring to the index number. Note that the indices start at 0.

In [14]:
item = tuple_1[0]
print(item)
# You can also use negative indexing, e.g -1 refers to the last item,
# -2 to the second last item, and so on
item = tuple_1[-1]
print(item)

Max
New York


### Add or change items
Not possible and will raise a TypeError.

In [15]:
#tuple_1[2] = "Boston"

### Iterating and check if an item exists in tuple
Iterating over a tuple by using a for in loop

In [16]:
for i in tuple_1:
    print(i)

if "New York" in tuple_1:
    print("yes")
else:
    print("no")


Max
28
New York
yes


### Useful dunctions and methods for tuple
* len() : get the number of elements in a tuple
* count(x) : Return the number of items that is equal to x
* index(x) : Return index of first item that is equal to x

In [17]:
my_tuple = ('a','p','p','l','e',)

# len() : get the number of elements in a tuple
print(len(my_tuple))

# count(x) : Return the number of items that is equal to x
print(my_tuple.count('p'))

# index(x) : Return index of first item that is equal to x
print(my_tuple.index('l'))

5
2
3


### Repetition, concatenation, and conversion

In [18]:
# repetition
my_tuple = ('a', 'b') * 5
print(my_tuple)

# concatenation
my_tuple = (1,2,3) + (4,5,6)
print(my_tuple)

# convert list to a tuple and vice versa
my_list = ['a', 'b', 'c', 'd']
list_to_tuple = tuple(my_list)
print(list_to_tuple)

tuple_to_list = list(list_to_tuple)
print(tuple_to_list)

# convert string to tuple
string_to_tuple = tuple('Hello')
print(string_to_tuple)

('a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b')
(1, 2, 3, 4, 5, 6)
('a', 'b', 'c', 'd')
['a', 'b', 'c', 'd']
('H', 'e', 'l', 'l', 'o')


### Slicing a tuple
Access sub parts of the tuple wih the use of colon (:), just as with strings.

In [19]:
# a[start:stop:step], default step is 1
a = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
b = a[1:3] # Note that the last index is not included
print(b)
b = a[2:] # until the end
print(b)
b = a[:3] # from beginning
print(b)
b = a[::2] # start to end with every second item
print(b)
b = a[::-1] # reverse tuple
print(b)

(2, 3)
(3, 4, 5, 6, 7, 8, 9, 10)
(1, 2, 3)
(1, 3, 5, 7, 9)
(10, 9, 8, 7, 6, 5, 4, 3, 2, 1)


### Unpack tuple

In [20]:
# number of variables have to match number of tuple elements
tuple_1 = ("Max", 28, "New York")
name, age, city = tuple_1
print(name)
print(age)
print(city)

# tip: unpack multiple elements to a list with *
my_tuple = (0, 1, 2, 3, 4, 5)
item_first, *items_between, item_last = my_tuple
print(item_first)
print(items_between)
print(item_last)

Max
28
New York
0
[1, 2, 3, 4]
5


### Nested tuples
Tuples can contain other tuples (or other container types).

In [21]:
a = ((0, 1), ('age', 'height'))
print(a)
print(a[0])

((0, 1), ('age', 'height'))
(0, 1)


### Compare tuple and list
The immutability of tuples enables Python to make internal optimizations. Thus, tuples can be more efficient when working with large data.

In [22]:
# compare the size
import sys
my_list = [0, 1, 2, "hello", True]
my_tuple = (0, 1, 2, "hello", True)
print(sys.getsizeof(my_list), "bytes")
print(sys.getsizeof(my_tuple), "bytes")

# compare the execution time of a list vs. tuple creation statement
import timeit
print(timeit.timeit(stmt="[0, 1, 2, 3, 4, 5]", number=1000000))
print(timeit.timeit(stmt="(0, 1, 2, 3, 4, 5)", number=1000000))

120 bytes
80 bytes
0.07349710000005416
0.006673200000022916


## Dictionaries
* A dictionary is a collection which is unordered, changeable and indexed.
* A dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its associated value. A dictionary is written in braces. Each key is separated from its value by a colon (:), and the items are separated by commas.

### Create a dictionary, and Access items
Create a dictionary with braces, or with the built-in dict funtion.

In [23]:
my_dict = {"name":"Max", "age":28, "city":"New York"}
print(my_dict)

# or use the dict constructor, note: no quotes necessary for keys
my_dict_2 = dict(name="Lisa", age=27, city="Boston")
print(my_dict_2)

name_in_dict = my_dict["name"]
print(name_in_dict)

# KeyError if no key is found
# print(my_dict["lastname"])


{'name': 'Max', 'age': 28, 'city': 'New York'}
{'name': 'Lisa', 'age': 27, 'city': 'Boston'}
Max


### Add, change, and delete items
Simply add or access a key and asign the value.

In [24]:
# add a new key
my_dict["email"] = "max@xyz.com"
print(my_dict)

# or overwrite the now existing key
my_dict["email"] = "coolmax@xyz.com"
print(my_dict)

# delete a key-value pair
del my_dict["email"]

# this returns the value and removes the key-value pair
print("popped value:", my_dict.pop("age"))

# return and removes the last inserted key-value pair 
# (in versions before Python 3.7 it removes an arbitrary pair)
print("popped item:", my_dict.popitem())

print(my_dict)

# clear() : remove all pairs
# my_dict.clear()

{'name': 'Max', 'age': 28, 'city': 'New York', 'email': 'max@xyz.com'}
{'name': 'Max', 'age': 28, 'city': 'New York', 'email': 'coolmax@xyz.com'}
popped value: 28
popped item: ('city', 'New York')
{'name': 'Max'}


### Check for keys, and loop through dictionary
dictionary has 'keys()', 'values()', and 'items()' methods.

In [25]:
my_dict = {"name":"Max", "age":28, "city":"New York"}
# use if .. in ..
if "name" in my_dict:
    print(my_dict["name"])

# use try except
try:
    print(my_dict["firstname"])
except KeyError:
    print("No key found")  

# loop over keys
for key in my_dict:
    print(key, my_dict[key])

# loop over keys
for key in my_dict.keys():
    print(key)

# loop over values
for value in my_dict.values():
    print(value)

# loop over keys and values
for key, value in my_dict.items():
    print(key, value)

Max
No key found
name Max
age 28
city New York
name
age
city
Max
28
New York
name Max
age 28
city New York


### Copy a dictionary
Like other types of collection data in Python, be careful when copying references.

In [26]:
dict_org = {"name":"Max", "age":28, "city":"New York"}

# this just copies the reference to the dict, so be careful
dict_copy = dict_org

# now modifying the copy also affects the original
dict_copy["name"] = "Lisa"
print(dict_copy)
print(dict_org)

# use copy(), or dict(x) to actually copy the dict
dict_org = {"name":"Max", "age":28, "city":"New York"}

dict_copy = dict_org.copy()
# dict_copy = dict(dict_org)

# now modifying the copy does not affect the original
dict_copy["name"] = "Lisa"
print(dict_copy)
print(dict_org)

{'name': 'Lisa', 'age': 28, 'city': 'New York'}
{'name': 'Lisa', 'age': 28, 'city': 'New York'}
{'name': 'Lisa', 'age': 28, 'city': 'New York'}
{'name': 'Max', 'age': 28, 'city': 'New York'}


### Merge two dictionaries
This can be done by using 'update()' method.

In [27]:
# Use the update() method to merge 2 dicts
# existing keys are overwritten, new keys are added
my_dict = {"name":"Max", "age":28, "email":"max@xyz.com"}
my_dict_2 = dict(name="Lisa", age=27, city="Boston")

my_dict.update(my_dict_2)
print(my_dict)

{'name': 'Lisa', 'age': 27, 'email': 'max@xyz.com', 'city': 'Boston'}


### Possible key types
Any immutable type, like strings or numbers can be used as a key. Also, a tuple can be used if it contains only immutable elements. However, using a list as a key is not possible, as a list is mutable.

In [28]:
# use numbers as key, but be careful
my_dict = {3: 9, 6: 36, 9:81}
# do not mistake the keys as indices of a list, e.g my_dict[0] is not possible here
print(my_dict[3], my_dict[6], my_dict[9])

# use a tuple with immutable elements (e.g. number, string)
my_tuple = (8, 7)
my_dict = {my_tuple: 15}

print(my_dict[my_tuple])
# print(my_dict[8, 7])

# a list is not possible because it is not immutable
# this will raise an Error:
# my_list = [8, 7]
# my_dict = {my_list: 15}

9 36 81
15


### Nested dictionaries
The values can also be container types (e.g. lists, tuples, dictionaries).

In [29]:
my_dict_1 = {"name": "Max", "age": 28}
my_dict_2 = {"name": "Alex", "age": 25}
nested_dict = {"dictA": my_dict_1,
               "dictB": my_dict_2}
print(nested_dict)

{'dictA': {'name': 'Max', 'age': 28}, 'dictB': {'name': 'Alex', 'age': 25}}
