# Announcements

* Homework #10 will be posted today.  __This is the last homework__

# Dictionaries and sets

<a href="https://en.wikipedia.org/wiki/Library_catalog" target="_blank"><img src="https://raw.githubusercontent.com/wlough/CU-Phys2600-Fall2025/main/lectures/img/Yale_card_catalog.jpg" width=500px /></a>

## PHYS 2600: Scientific Computing

## Lecture 22

## Advanced data collections

Python has a couple more really useful built-in data structures we haven't seen yet - both for __unordered__ data.

Let's start with __sets__.  A set (A) has _no order_, and (B) only contains _unique_ values.  Sets are created using curly brackets `{}` or cast from other types of collections (e.g. a list) using `set()`:

In [None]:
my_set = { 1, 4, 9, 4, 16, 1}
print(my_set)
print(set('elephant'))

Notice that because __sets never contain duplicates__, even though 1 and 4 appeared twice in our initial assignment of the set, when we print it out they only appear once.  (Ditto for `'e'` in the 'elephant' set.)

We can do all the standard set-theory operations in Python, given two sets:

In [None]:
set_A = {'apple', 'orange', 'banana'}
set_B = {'kiwi', 'apple', 'pear'}

print(set_A | set_B)  ## union
print(set_A & set_B)  ## intersection
print(set_A - set_B)  ## "in A, not in B"
print(set_B - set_A)  ## "in B, not in A"
print(set_A ^ set_B)  ## "not in (A&B)"

We won't spend much time on sets - they are the least common structure used for scientific computing.  A more in-depth tutorial (with Venn diagrams!) [can be found here](https://www.programiz.com/python-programming/set).

## Dictionaries

The __dictionary__ (or "dict") for short is also an unordered collection, but like a list, its items are organized according to an _index of names_ (called the __keys__ of the dictionary.)  While the set has some niche uses, the dictionary is one of the most useful "glue" data-types that Python contains - you'll reach for it frequently!

A new dictionary is also created with curly brackets `{}`, but instead of single items, each entry consists of a paired __key__ and __value__, separated by a colon `:`.

In [None]:
my_dict = {
    'apples': 7,
    'oranges': 3,
    'kiwis': 0,
    4: [2.3,-1.1,4.7],  # trailing comma by convention
}
print(my_dict)
print(my_dict['apples'])
print(my_dict[4][0])

You can see that we index a dictionary with `[]` just like a list or array.  But the indices are Python objects - usually strings, but not always.



In other programming languages, the Python dictionary might be called a __lookup table__ or a __key-value store__.  Here's a cartoon of how the storage in a dictionary works:
<img src="https://raw.githubusercontent.com/wlough/CU-Phys2600-Fall2025/main/lectures/img/dicts-1.png" target="_blank">

Like a list, a dictionary object like `my_dict` points to a collection of names, that in turn point to other Python objects.  Unlike a list, _there is no ordering_.  

A good way to think of a `dict` is as a (one-way!) __map__ between two sets of Python objects.

Also like lists, dictionaries are __mutable__: a copy of a dictionary is pointing to the same set of keys, so modifying one will change the other.

In [None]:
copy_dict = my_dict
print(copy_dict)
copy_dict['oranges'] = 32
copy_dict['bananas'] = 99
print(copy_dict)
print(my_dict)

If we want to avoid this behavior, we can use the `.copy()` command just like we do on lists, to get an independent dictionary.

Note that due to the lack of ordering, there is no "append" function for dictionaries!  If we want to add something to a dict, we just __assign to that key__, whether it exists or not.

On the other hand, it still makes sense to __merge__ two dictionaries together, like `.extend()` does for lists.  To do so, we use the `.update()` method:

In [None]:
d1 = { 'x': 3, 'y': -1, 'z': 2 }
d2 = { 'u': 'True', 'v': 'False', 'x': 'True'}

d1.update(d2)
print(d1)

Notice that __this works like `.append()` and `.extend()`__, in that it doesn't return anything but rather _mutates the dictionary in place._  Also notice that when the two dictionaries conflict, the values in `d1` are __overwritten by `d2`__.  

(This is why the method is called `.update()`: because it "updates" the keys in `d1` with corresponding values in `d2`.)

Dictionaries are _efficient_: looking up a key, or especially _whether a certain key exists_, is very fast.  This is because like the index of a book, the keys are __sorted__ in a particular way.

In [None]:
d = {}
l = []
for i in range(1000000):
    d[i] = i
    l.append(i)

%timeit 99999 in l
%timeit 99999 in d.keys() # 10,000x faster!
%timeit 99999 in d.values() # Slower than the regular list!


Since dictionaries are _unordered_, key sorting is done in a way that's convenient for the _computer_ - not for us!   (The sorting is based on a __hash__ of the keys - Python dictionaries are an example of a "hash table".

A hash is a function that maps any sort of data, of any size, to a short code of a fixed size.  We can see what these codes look like for ourselves with the `hash()` function:

In [None]:
import numpy as np

print(hash('q'))
print(hash(47))
print(hash((1,3,2,5)))
print(hash(np))

Hashing gives dictionaries flexibility: we can use any Python object that can be converted to a hash as a key, and the keys can still be "sorted" in a meaningful way for the computer to look them up.  (Not _all_ Python objects can be hashed: I won't go into detail but if you get an error message that includes "unhashable" then you're trying to use something as a key that doesn't have a hash.)

One downside to hashing is the possibility of __hash collision__, when two different objects map to the same hash.  In Python, dictionaries automatically check for and correct hash collisions, so you don't need to worry about them.

## Tutorial 22

Time for some practice with dictionaries!