# Set

So far, we have covered two types of collections in Python: list and tuple. The advantage of a tuple is that it is immutable and of a list that it is sortable. The disadvantage of lists and tuples is that they are very slow when you need to find a specific value. For this purpose, Python has a 'set'. A set is a collection in which all elements are unique. The values are not stored at indices but by a smart way that allows to find elements very efficiently.

A literal set is created by summing up elements between curly braces `{}`. It is allowed to add duplicates, but as you can see, the end result is that every value only appears once. 

In [None]:
places = {'Delft', 'Amsterdam', 'Delft', 'Leiden'}

In [None]:
places

Typical use of sets is to efficiently maintain a collection of unique values, quickly check if a value is in a set and to iterate over a set. The following table lists common operations on sets:

In [None]:
a = {2, 5, 3, 5}
b = {5, 4, 2}

| code | result | comment |
|:-----|:-------|:--------|
| 2 in a | True | True is the given element is in the set |
| a.add(9) | a == {2, 3, 5, 9} | adds an elemment to the set |
| a.remove(2) | a == {3, 5} | removes an element from a set |
| len(a) | 3 | A list is just like any iterable collection, however, note that elements are unique |
| sum(a) | 10 |  |
| set([1, 1, 2]) | {1, 2} | creates a set from the unique elements in a list |
| a.union(b) | {2, 3, 4, 5} | returns a union of two sets (a remains unchanged) |
| a.intersect(b) | {2, 5} | returns the intersect between two sets |

The last few examples show that we can apply set-operations on sets, therefore there are also function to check `issubset` and to get the `difference` between two sets. 

There are also some operations on lists that cannot be applied to sets: indexing, slicing, sorting, inserting at a position. Note that although it may seem that sets are sorted, there is no guarantee that they are. 

# Assignments

#### Check if the value 3 and 5 are in `b`, in one expression

In [None]:
b = {3, 5, 7}

In [None]:
%%assignment
# YOUR CODE HERE

In [None]:
%%check
b = {3, 7, 8}
not result
b = {5, 7, 9}
not result
b = {3, 5, 7}
result

#### Create a literal set `vowel`, with the vowels 'a', 'e', 'i', 'o', 'u'.

In [None]:
%%assignment
# YOUR CODE HERE

In [None]:
%%check
len(vowel) == 5
'aeiou' == ''.join(sorted(vowel))

#### Write a function `check_vowel`, that takes a `character` and returns True if the character is a vowel, or False otherwise.

Note, you do not need use and `if` and return a True/False, since a comparison is already a Boolean, you can just return the result of a comparison.

In [None]:
%%assignment
# YOUR CODE HERE

In [None]:
%%check
signature check_vowel character
vowel = 'aeiou'
all(map(check_vowel, 'aeiou')) # Your check_vowel function does not return True for every vowel
import string
sum(map(check_vowel, string.ascii_lowercase)) == 5

# Set vs List

Note that for this last assignment, we could have used a `list` instead of a `set`. However, the lookup done with the `in` operator in a `set` is a magnitude faster. Therefore, for large collections, you will prefer to use sets to do lookups.

We will do a little experiment, we will use a set and list that are populated with all even numbers smaller than 1 million.

In [None]:
numbers_set = { i for i in range(1000000) if i % 2 == 0 }

In [None]:
numbers_list = [ i for i in range(1000000) if i % 2 == 0 ]

Then we do a hundred lookups in each of them. The %%time reports the time it takes to execute a cell.

In [None]:
%%time
for i in range(1000):
    if i in numbers_set:
        pass

In [None]:
%%time
for i in range(1000):
    if i in numbers_list:
        pass

Since a µs is $10^{-6}$s, it is easy to see why we use sets, the lookups are around 40.000 times faster.