Sets
====

We've seen a number of data structures in the lecture so far, for example:

* string: an ordered collection of characters (eg. `'hello'`), so there is a character at index 0 (`'h'`), a character at index 1 (`'e'`), and so on.
* list: an ordered collection of anything (eg. `[1, 2, 3, 'a', 1]`), so there is a thing at index 0 (the number `1`), a thing at index 1 (the number `2`), and so on.

Sets are collections as well, but unlike strings and lists, they are _unordered_.  You can think of them as a "bag" or mathematical set of objects, and so it doesn't make sense to talk about which item in the set is "first".

If we were to create a set out of our example list `[1, 2, 3, 'a', 1]`, then we would get `1`, `2`, `3`, and `'a'`, but when we came to the second `1` we already have a `1` in the set and so we can't add it again.  So an additional distinction between sets and ordered collections like strings and lists is that you can't have duplicates in a set.

There's one other critical distinction between sets and lists: things stored in sets must be immutable (or technically, hashable).  So you can put strings, numbers and tuples into sets, but not dictionaries or lists (since they are mutable).

So a set is an _unordered_ collection of _unique_, _immutable_ objects.

Constructing Sets
-----------------

You can create an empty set using the `set()` function:

In [3]:
s = set()
s

set()

If you have an existing collection, like a list, you can use it to create a set in a similar way:

In [4]:
t = set([1, 2, 3, 1])
t

{1, 2, 3}

Note that the duplicated 1 in the list is removed.  Also note that sets in Python are represented with braces (or curly brackets), similar in some ways to dictionaries, but dictionaries have _pairs_ of key and value for each item in the dictionary, while sets only have one value per entry.

You can use braces to create a set as well:

In [5]:
t = {1, 2, 3, 1}
t

{1, 2, 3}

but you have to be careful, because you can't create an empty set this way:

In [6]:
s = {}
print(s)
type(s)

{}


dict

The empty braces gives you a dictionary, rather than a set. There is some ambiguity in the meaning of `{}`, but dictionaries were introduced into Python long before sets were, so the default is to create a dictionary.

So if you want an empty set, you have to use `set()`.

In [None]:
s = set()
s

Examples of Usage
-----------------

A very common use case for sets is as a convenient way of removing duplicates elements from a list.  For example, you might have a list of e-mails that you've got from people visiting your store or website:

In [8]:
email_list = ['joe@acme.com', 'sue@cooks.com', 'bud@plumbers.com',
              'bugs@cartoon.com', 'betty@cartoon.com',
              'joe@acme.com', 'bugs@cartoon.com']

Note that Joe's e-mail is repeated twice.  You don't want to send the same e-mail twice to the same people, so you can remove the duplicates using `set()`:

In [9]:
emails = set(email_list)
emails

{'betty@cartoon.com',
 'bud@plumbers.com',
 'bugs@cartoon.com',
 'joe@acme.com',
 'sue@cooks.com'}

Now if you imagine that there are multiple websites, you might want to combine the email lists for each site to create a single mailing list, or you might want to do an analysis where you work out which people visit multiple sites, or visit one site but not another one, and so on.

It turns out that sets have some nice tools to do this sort of calculation easily, and we'll take a look at those in the next lecture.

Copyright 2008-2016, Enthought, Inc.<br>Use only permitted under license.  Copying, sharing, redistributing or other unauthorized use strictly prohibited.<br>http://www.enthought.com