# Python sets and dictionaries

## Lists/tuples (key points of the previous lesson)

Square brackets `[ ... ]` are used to create a new list.  
Parentheses `( ... )` are used to create a new tuple.  
A list/tuple has any number of elements.  
Elements in a list/tuple may be repeated (duplicated).  
Each element has its position.  
First element is at index zero.

In [None]:
grades = [ 7, 6, 7, 8, 6, 6.5, 8 ]
print( "Elements:", grades )          # note: spaces are added
print( "Length:", len( grades ) )
print( "4th element:", grades[3] )

Content of a list can be modified. Lists are not immutable.
Content of a tuple is fixed. Tuples are immutable.

In [None]:
toBuy = [ "yeast", "flour", "oil", "salt" ]
print( "   Start:", toBuy )
toBuy[2] = "olive oil"
print( "Modified:", toBuy )
grades.append( "sugar" )
print( "Appended:", toBuy )


Comprehensions provide a very compact notation to:

- Perform an operation of each element of a list/tuple.
- Use a condition to filter elements of a list/tuple.

In [None]:
# this is a special tuple: note, at index 0 is the text "zero", at index 1 is the text "one", etc.
# such a tuple can be used to map numbers to their names 
# (but note, only numbers starting from 0 can be mapped here; to map other texts/numbers, a dictionary will be used)
digit2name = ( 'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine' )
print( "     digit2name:", digit2name )

digits = [ 5, 2, 6, 2, 0, 9, 2, 1, 9 ]                     # some random numbers
print( "         digits:", digits )

digitsNames = [ digit2name[d] for d in digits ]
print( "    digitsNames:", digitsNames )

smallDigits = [ d for d in digits if d <= 3 ]
print( "    smallDigits:", smallDigits )

digitsWithNames = [ (d, digit2name[d] ) for d in digits ]  # tuples are produced here
print( "digitsWithNames:", digitsWithNames )

Note the basic functions which create a list (or a tuple) by iterating over elements of an object provided as argument:

In [None]:
txt = "Statistics"
list( txt )     # or tuple( txt )

A special case of a list is a list of integers created by `range(n)` generator function:

In [None]:
range( 10 )           # a generator, knows how to produce numbers from 0 to 9 when iterated
tuple( range( 10 ) )  # a tuple: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

## Sets

`set` is a built-in data type with the following properties:
- it is a container which can keep any number (0, 1, 2, ...) of other elements
- an element is either `in` the set or `not in` the set - there are no duplicated elements
- elements can be added to a `set` or removed from it 
- an element added to a set cannot ever change, therefore only elements of *immutable* types are allowed
- `frozenset` is an *immutable* variant of a set - once created it cannot be changed but therefore a `frozenset` can be used as an element of a set
- it is possible to iterate over all elements of a set, but no assumptions can be made about iteration order

### A new set

Let's consider a *tiramisu* preparation example (note two ways to create a set: `{...}` and `set(...)`):

In [None]:
# See https://www.recipesfromitaly.com/tiramisu-original-italian-recipe/
# A *set* with ingredients needed for tiramisu:
tiramisuIngredients = { "ladyfingers", "mascarpone", "eggs", "sugar", "espresso", "rum", "cocoa" }

# An example *set* of ingredients which might be available
# (note: here first a list is created and then it is converted to a set)
inHouseIngredients = set( [ "eggs", "espresso", "cocoa", "butter", "strawberries" ] )
inHouseIngredients

Use `len(...)` to find the number of elements in a set:

In [None]:
len( tiramisuIngredients )

The type of an object can be checked (as usual) with `type(...)`:

In [None]:
type( tiramisuIngredients )

But note, that the *empty set* cannot be created with `{}` - it leads to an object of a different type:

In [None]:
type( {} )

To create the *empty set* use `set()` with no arguments:

In [None]:
type( set() )

### Sets algebra

Standard algebraic operations on sets are [drawn here](set_operations.png).

For the tiramisu example, sets of potential interest can be constructed with the operators:


In [None]:
# Which ingredients are already available in house:
tiramisuIngredients & inHouseIngredients

In [None]:
# What still needs to be bought to prepare tiramisu:
tiramisuIngredients - inHouseIngredients

### Sets: is an element in or not?

Whether a single element is present `in` a set can be tested as follows:

In [None]:
"milk" in tiramisuIngredients

To check whether an element is `not in` a set, use:

In [None]:
"milk" not in tiramisuIngredients

Consider also this example of using `not in` inside a comprehension iterating over a set:

In [None]:
# A True/False map whether an in-house-ingredient is not needed for tiramisu.
[(ing,ing not in tiramisuIngredients) for ing in inHouseIngredients]

Note: the `set` methods `issubset(...)` and `issuperset(...)` allow to test whether all elements of one set are also present in the other set. The same functions are provided through `<=`, `=>` operators. The `==` allows to check whether two sets have identical elements.

In [None]:
tiramisuIngredients.issubset( inHouseIngredients )

### Sets: adding/removing elements 

Let's consider a set with some available ingredients:

In [None]:
inHouseIngredients

New elements can be added as follows:

In [None]:
# For a single element:
inHouseIngredients.add( 'bread' )

# For an iterable collection of elements (it can also be another set):
inHouseIngredients.update( [ 'salami', 'tomatoe' ] )

inHouseIngredients

And here are some examples how to remove elements:

In [19]:
# For an element currently in the set:
inHouseIngredients.remove( "bread" )

# Note: KeyError exception would be raised because "BREAD" is not in the set:
# inHouseIngredients.remove( "BREAD" ) 

# For a value which might be in the set (but no error when the value is not there):
inHouseIngredients.discard( "BREAD" )

### Sets: a loop over all elements

The following code allows to perform some operations (here: `print(...)`) *for each* element present in the set.  
The order in which the elements are iterated over is not defined.

In [None]:
for ing in tiramisuIngredients:
    print( "Needed for tiramisu:", ing )

## Dictionaries

`dict` is a built-in data type with the following properties:
- it is a container which can keep any number (0, 1, 2, ...) of (*key*, *value*) `items`
- the primary goal: using a key it is possible to quickly `get` its corresponding value
- there are no duplicates among the `keys`; it is possible whether a key is `in` or `not in` a dict object
- the keys must be of *immutable* type
- a value corresponding to a key can be of a mutable type
- new items can be added
- existing items can be removed
- `for` loops can iterate over all items of a dictionary, or over the `keys`, or over `values`

### A new dictionary

Use the following notation to create a dictionary from element pairs (note the curly braces: `{`...`}`):

In [None]:
day2KCal = { "Mon": 2330, "Tue": 1990, "Wed": 2150 }
day2KCal

The type of the created object is:

In [None]:
type( day2KCal )

Note: repeating the same key (here: `Mon`) overwrites the previously associated value:

In [None]:
{  "Mon": 2330, "Tue": 1990, "Wed": 2150, "Mon": 1000 }

The following code allows to build a dictionary from two iterable objects:

In [None]:
days = ( "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun" )      # a tuple
dayKCals = [ 2330, 1990, 2150, 2290, 1920, 2370, 2050 ]         # a list
zip( days, dayKCals )           # a generator, iterable over tuples (day, dayKCal)
list( zip( days, dayKCals ) )   # a list of tuples generated by the generator

day2KCal = dict( zip( days, dayKCals ) )
day2KCal

This is the standard way to find the number of items (here: key-value pairs) in a collection:

In [None]:
len( day2KCal )

Note: this is how a new empty dictionary is created:

In [None]:
{}

### Dict: getting, adding or modifying elements

Let's start with a dictionary:

In [None]:
day2KCal = { "Mon": 2330, "Tue": 1990, "Wed": 2150 }
day2KCal

To add a new value or change a value for a single key, use the following:

In [None]:
day2KCal[ "Thu" ] = 2290
day2KCal

Multiple items can be added/updated by calling `update(...)` for an iterable over (key, value) items: 

In [None]:
day2KCal.update( [ ( "Fri", 1920 ), ( "Sat", 2370 ) ] )
day2KCal

The square bracket `[`...`]` operator can be used to get an element for a provided key.  
The key must be present in the dictionary or an error will be thrown:

In [None]:
# day2KCal[ "Monday" ]         # "Monday" is an invalid key and raises KeyError exception
day2KCal[ "Mon" ]              # returns the value for the key "Mon"

The `get( key, defaultValue )` function may be used to avoid exceptions when the key is missing in the dictionary. Then the `defaultValue` is returned:

In [None]:
day2KCal.get( "Monday", 100 )  # does not raise any exception
                               # but it returns the second argument (here 100)

The following two examples show how to check whether a key is `in` or is `not in` a dictionary:

In [None]:
"Tuesday" in day2KCal

In [None]:
aKey = "Tuesday"
aKey not in day2KCal

### Dict: removing an element

A few possibilities exists to remove an element:

In [34]:
day2KCal = { "Mon": 2330, "Tue": 1990, "Wed": 2150 }
day2KCal.pop( "Tue" )           # removes Tue but returns the value which Tue had
                                # raises exception if the key is not found

day2KCal.pop( "Monday", None )  # removes only when the key is present; returns the second argument
                                # does not raise the key not found exception

Or:

In [35]:
del day2KCal[ "Mon" ]           # also removes an existing element
                                # raises exception if the key is not found

### Dict: all keys, all values or their pairs

With `values()` it is possible to iterate over all values in a dictionary:

In [None]:
day2KCal = { "Mon": 2330, "Tue": 1990, "Wed": 2150 }
day2KCal.values()                  # this is an iterable collection
                                   # it is a view: it will change if the dict changes

An example of `values()` used in a list comprehension:

In [None]:
kJoules = [kCal * 4.184 for kCal in day2KCal.values()]
kJoules

`items()` provide an iterator over dictionary, generating two element tuples (key, value).  
Here, this iterator is used to create another dictionary:

In [None]:
day2KCal.items()                  # this is a generator of (day, kCal) tuples
day2KJoule = { day: kCal * 4.184 for day, kCal in day2KCal.items()}
day2KJoule

It is also possible to iterate only over `keys()`.  
The following example creates a tuple of all dictionary keys:

In [None]:
day2KCal.keys()                   # this is a generator
days = tuple( day2KCal.keys() )   # iterates over the generator and produces a tuple
days

### Dict: a loop over elements

`items()`, `values()`, `keys()` can also be used in loops. For example:

In [None]:
day2KCal = { "Mon": 2330, "Tue": 1990, "Wed": 2150 }
for day, kCal in day2KCal.items():
    print( "On", day, "consumed food was", kCal, "kCal or", kCal * 4.184, "kJ." )

## Formatting strings

Compare the `print` command from the previous example with the print below which uses *f-strings (formatted string literals)*.

In [None]:
day2KCal = { "Mon": 2330, "Tue": 1990, "Wed": 2150 }
for day, kCal in day2KCal.items():
    print( f"On {day} consumed food was {kCal} kCal or {kCal * 4.184} kJ." )

In *f-strings* `f"`...`"` variables/expressions can be put between `{`...`}` to have their values inserted into the surrounding text.

In [None]:
x = "Statistics"
y = "Data Science"
f"{x} and {y}"

In [None]:
f"{x}" " and " f'{y}'            # Note, there are several strings here
                                 # and they get concatenated

In [None]:
( f"{x}"
" and "
f"{y}" )                         # The above example in a multiline version, see (...)

The *f-strings* notation allows to specify rounding and alignment of the inserted text.  
Consider the following examples:

In [None]:
from math import pi
{ 
    "               full precision":  f"{pi}",
    "                  four digits":  f"{pi:.4f}",
    "four digits and forced + sign":  f"{pi:+.4f}",
    "                right aligned":  f"{pi:12.4f}",
    "               center aligned":  f"{pi:^12.4f}",
    "                 left aligned":  f"{pi:<12.4f}",
    "            exponent notation":  f"{pi * 1000:.4e}",
    "        with comma separators":  f"{pi * 1e6:,.2f}",     # Note: the comma here is ok
}

## Self-study tasks

### Generating random integers

Use `from random import randint` to make the function `randint` available.  
Try `randint` to generate a random number in the range `[0,...,9]` (both ends inclusive).  
Write a list comprehension to generate a list with 20 random numbers in the range.  
Store the generated list in the variable `vs`.  

*Note:* The `range(...)` generator might be useful in the comprehension.

*Question:* What would be `set(vs)`?

In [None]:
# SOLUTION
from random import randint
vs = [randint( 0, 9 ) for x in range(0, 20)]
print( vs )

# set(vs) will contained unique values present at least once in vs
print( set(vs) )

### Counting elements

A list of many elements is provided in the variable `vs` (identical elements may occur multiple times in the list).  
(*Note:* Such a list can be the output from the previous exercise, but it can be a list of for example text strings).

Write your own element *counter* which produces a dictionary with keys being the unique elements of the `vs` list and the values representing their counts.  
Store the dictionary in `v2cnt`.

*Note:* There are many solutions. You may first build a set of unique `vs` elements. Then initialize a dictionary for these elements with `0` count values. Finally, in a `for` loop increase the counts in the dictionary.

In [47]:
# For example, for this list:
vs = [ 1, 7, 3, 1, 0, 2, 5, 7, 8, 2, 6, 2, 4, 1, 5, 5, 0, 2, 5, 4, 1, 3, 1, 8, 2, 1 ]

# The solution should be (0 is present twice, 1 is six times, 2 five times...)
# vs2cnt = {0: 2, 1: 6, 2: 5, 3: 2, 4: 2, 5: 4, 6: 1, 7: 2, 8: 2}

In [None]:
# SOLUTION
uniqueVs = set( vs )
v2cnt = { v: 0 for v in uniqueVs }
for v in vs:
    v2cnt[v] += 1
v2cnt

### Removing from a dictionary all items of given values

Given is a dictionary `licPlate2color` with colors of cars corresponding to some license plates.  
Create a filtered dictionary `lp2c` which does not contain cars of colors listed in `toRemoveColors`.

In [49]:
licPlate2color = {
    "VA-111-V": "silver", "SB-222-W": "red", "XC-333-L": "red",
    "AB-111-E": "white", "ER-222-U": "black", "BV-333-Z": "white",
    "CC-111-J": "silver", "UI-222-R": "green", "GF-333-U": "silver",
    "WT-111-K": "white", "KJ-222-Q": "silver", "LK-333-I": "black",
}
toRemoveColors = [ "white", "silver", "black" ]

In [None]:
# SOLUTION
lp2c = {lp: c for lp, c in licPlate2color.items() if c not in set(toRemoveColors)}
lp2c

### Build a dictionary with values being lists

Some people are described at the same positions in two lists: `names` and `countries` (see below).  
Build a dictionary `country2names` with values being lists of `names` of people from a country given by the key.  
For example: `country2names["nl"]` should be a list `['Jeroen', 'Sanne']`.

*Note:* A possible solution may use `defaultdict` imported from `collections` package.

In [51]:
names =     ["Grzegorz", "Małgorzata", "Paweł", "Jeroen", "Sanne", "Ana", "Sofia", "Javier", "Sofia"]
countries = ["pl",       "pl",         "pl",    "nl",     "nl",    "es",  "es",    "es",     "es"]

In [None]:
# SOLUTION
uniqueCountries = set( countries )     # this produces countries without duplicates
country2names = {uc:[n for n, c in zip( names, countries ) if c == uc] for uc in uniqueCountries}
country2names

In [None]:
# SOLUTION (another option with defaultdict)
from collections import defaultdict
country2names = defaultdict( list )
for c, n in zip( countries, names ):
    country2names[c].append( n )
country2names

### Reverse a list and sample elements

A volleyball team has 11 players (each of a different name). The players specialize and have their own roles.
The `player2role` dictionary provides the roles of the players.

For a volleyball set the team needs to select 6 players with the following roles:
- one `setter`,
- one `dia`,
- two `middle` attackers,
- two `outside` attackers.  
The numbers of needed players of each role is given by the `role2num` dictionary.

From the complete team choose 6 player randomly but so that there is a correct number of players with each role.  
The result should be a dictionary (keys are the roles, values are lists of chosen players).

*Hint:* `from random import sample` (search for documentation of the function `sample`)

*Hint:* Based on the previous exercise build first `role2players` dictionary (reverse `player2role`).


In [54]:
from random import sample

role2num = { "setter": 1, "dia": 1, "middle": 2, "outside": 2 }

player2role = { 
    "Chen": "setter", "Martijn": "setter",
    "Marnick": "dia", "Simon": "dia",
    "David": "middle", "Luuk": "middle",
    "Ronald": "outside", "Alex": "outside", "Kadir": "outside", "Koen": "outside"
}

In [None]:
# SOLUTION
role2players = {selRole:[p for p, r in player2role.items() 
                          if r == selRole] for selRole in role2num.keys()}
{r:sample(role2players[r], n) for r, n in role2num.items()}

### NATO phonetic alphabet

When spelling a word, the following `codeWords` are used to represent letters (`a` is pronounced `alpha`, `b` is `bravo`, ...).

Build a dictionary `chr2codeWord` which will map a letter `chr` to its code word `chr2codeWord[chr]`.  
For example, `chr2codeWord['s']` should be equal to `sierra`.  
Do not type the content manually - use `split` to get a list of words from `codeWords` and then write a comprehension to build a dictionary mapping the first letter (`codeWord[0]`) to the corresponding `codeWord`.

Finally, given `word = "statistics"` write a comprehension mapping letters from `word` to a list of code words in the order how they should be spelled (`sierra`, `tango`, `alpha`, `tango`, ...).

In [None]:
codeWords = "alpha bravo charlie delta echo foxtrot golf hotel india juliet kilo lima mike november oscar papa quebec romeo sierra tango uniform victor whiskey x-ray yankee zulu"
codeWords

In [None]:
# SOLUTION
chr2codeWord = { word[0]: word for word in codeWords.split( ' ' ) }
word = "statistics"
[chr2codeWord[chr] for chr in word]

### Read weather data

Understand the Open-Meteo web page:  
https://open-meteo.com/en/docs#latitude=52.16&longitude=4.49&hourly=temperature_2m.  

On that page you may generate a URL link providing access to weather predictions, for example for temperature in Leiden:  
https://api.open-meteo.com/v1/forecast?latitude=52.16&longitude=4.49&hourly=temperature_2m

The data is provided in the [JSON](https://en.wikipedia.org/wiki/JSON) format which can be easily read in Python using the `json` and `urllib.request` libraries.

Use the following code to get the `data`. Find out how to extract the list with temperatures and the list with corresponding time point texts. Finally, build a dictionary object `time2temp` mapping the time point text to the temperature.

In [58]:
import urllib.request, json 
url = "https://api.open-meteo.com/v1/forecast?latitude=52.16&longitude=4.49&hourly=temperature_2m"
data = json.load( urllib.request.urlopen(url) )
# print(data)          # use this to print the nested list with data

In [None]:
# SOLUTION
time2temp = dict( zip( data['hourly']['time'], data['hourly']['temperature_2m'] ) )
time2temp