# Sets

* unordered collections of unique elements
* uniques only - great for getting unique items out of some collection
* curly braces {3, 6, 7} - we use {} for disctionaries as well, dictionaries used : for key:value

![Set](https://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Venn_A_intersect_B.svg/440px-Venn_A_intersect_B.svg.png)

https://en.wikipedia.org/wiki/Set_theory

## Creating a set

In [1]:
s = {3,3,6,1,3,6,7}
print(s)

{1, 3, 6, 7}


In [2]:
# alternative syntax
nset = set((3,3,6,1,3,6,7))
nset

{1, 3, 6, 7}

In [3]:
a = set("pelmeņi un krējums") # takes a sequence so string qualifies
a

{' ', 'e', 'i', 'j', 'k', 'l', 'm', 'n', 'p', 'r', 's', 'u', 'ē', 'ņ'}

In [4]:
# passing different types of elements
b = {"abba", "baba", 1, 2, 3, 4, 5}
b

{1, 2, 3, 4, 5, 'abba', 'baba'}

In [5]:
bset = set(["abba", "baba"])
bset

{'abba', 'baba'}

In [11]:
aset = set("abracadbra") # compare with next set which is a list
aset

{'a', 'b', 'c', 'd', 'r'}

Python sets do not have a specific order. You cannot access items in a set by referring to an index, since sets are unordered the items has no index.

In [13]:
# looping thourgh a set
for i in aset:
    print(i)

r
a
c
b
d


## Membership testing in sets

In [10]:
# This Lookup or membership testing is very quick even for large sets
# In computer science terms, this is O(1) operation, so constant time even with millions of elements
# It is much faster than a list
'a' in a, 'a' in b, 'abba' in b

(False, False, True)

In [14]:
# if you need sorted list from a set
# then use sorted function which returns a list
mylist = sorted(aset) # sorted gives you a list
mylist

['a', 'b', 'c', 'd', 'r']

In [15]:
# list lookup is linear so much slower for large data list > 10_000 and so on
'a' in mylist, 'b' in mylist, 'f' in mylist 

(True, True, False)

In [16]:
type(s), type(aset)

(set, set)

In [17]:
"|".join(sorted(a)) # you can join with any character even blank space
# notice that sorting is using Unicode chr values so Latvian letters are after English
# TODO sort it locale specific way

' |e|i|j|k|l|m|n|p|r|s|u|ē|ņ'

## Set as a way to remove duplicates from a list
Sets offer an easy way to remove duplicates from a list. Just convert the list to a set and then back to a list.

In [18]:
shopping_list = ["apple","banana", "carrot", "banana", "apple", "banana", "pumpkin","candy", "apple"]
shopping_set = set(shopping_list)
unique_items = list(shopping_set)

# let's print all three
print("Original Shopping list:\n",shopping_list)
print("Unique items set:\n",shopping_set)
print("Unique items list:\n",unique_items)

Original Shopping list:
 ['apple', 'banana', 'carrot', 'banana', 'apple', 'banana', 'pumpkin', 'candy', 'apple']
Unique items set:
 {'apple', 'banana', 'pumpkin', 'carrot', 'candy'}
Unique items list:
 ['apple', 'banana', 'pumpkin', 'carrot', 'candy']


In [19]:
# we could have done this in one line
unique_items = list(set(shopping_list))
print("Unique items list:\n",unique_items)

Unique items list:
 ['apple', 'banana', 'pumpkin', 'carrot', 'candy']


## Set operations

Set offers a wide range of set algebra operations. Here are some of the most common ones:

* issubset
* issuperset
* union
* intersection
* difference
* symmetric difference

In [21]:
# range is a sequence of numbers so we can convert it to a set
n_3_7 = set(range(3,8))
n_3_7

{3, 4, 5, 6, 7}

In [22]:
nset = set(range(10))
print("number set is", nset)
n_3_7.issubset(nset)

number set is {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}


True

In [23]:
# Alternative syntax
n_3_7 < nset # strong subset meaning n_3_7 can't be equal to nset

True

In [27]:
n_3_7 <= nset  # this is just like issubset but not strict
# This allows equal sets to be considered as subset

True

In [25]:
nset < nset # strong subset meaning nset can't be equal to nset

False

In [26]:
nset <= nset # this is just like issubset but not strict

True

In [29]:
s.add(65) # we can add elements
print(s)
try:
    s.remove(65) # we can remove elements
except KeyError as e:
    print("Error:", e)
print(s)

{65, 1, 3, 6, 7}
{1, 3, 6, 7}


In [30]:
nset.issuperset(s)

True

## Union

The union of two sets is a set containing all elements that are in either set.

In [32]:
n_5_9 = set(range(5,10))
print("n_5_9 is", n_5_9)
print("Union of n_3_7 and n_5_9 is", n_3_7.union(n_5_9))

n_5_9 is {5, 6, 7, 8, 9}
Union of n_3_7 and n_5_9 is {3, 4, 5, 6, 7, 8, 9}


In [33]:
# Shorter union syntax is pipe operator
n_3_7 | n_5_9 # means we make a set of all elements in both sets

{3, 4, 5, 6, 7, 8, 9}

## Intersection

The intersection of two sets is a set containing all elements that are in both sets.

In [34]:
n_3_7.intersection(n_5_9)

{5, 6, 7}

In [35]:
# Syntactic sugar for intersection is ampersand 
n_3_7 & n_5_9

{5, 6, 7}

In [36]:
# To store the values in the intersection
n_5_7 = n_3_7 & n_5_9
n_5_7

{5, 6, 7}

## Difference

Set difference is the set of elements that are only in the first set but not in the second set.

Thus set difference is not commutative. This means that the order of the sets is important for difference.

In [37]:
# Shows only the elements unique to the left set
n_3_7.difference(n_5_9)

{3, 4}

In [38]:
# Syntactic sugar for difference is minus operator
n_3_7 - n_5_9, n_5_9 - n_3_7

({3, 4}, {8, 9})

## Updating sets

In [39]:
# we can update  a single with many differnt data types as long as they are in iterable format
s.update({3,3,6,2,7,9},range(4,15), [3,6,7,"Valdis", "Badac","Valdis"],"Abba")
s

{1,
 10,
 11,
 12,
 13,
 14,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 'A',
 'Badac',
 'Valdis',
 'a',
 'b'}

In [40]:
dir(s)

['__and__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iand__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__isub__',
 '__iter__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__rand__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__ror__',
 '__rsub__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__xor__',
 'add',
 'clear',
 'copy',
 'difference',
 'difference_update',
 'discard',
 'intersection',
 'intersection_update',
 'isdisjoint',
 'issubset',
 'issuperset',
 'pop',
 'remove',
 'symmetric_difference',
 'symmetric_difference_update',
 'union',
 'update']

In [41]:
# we can check if our set has anything in common with anohther data structures
n_3_7.isdisjoint(n_5_9) # False because sets do intersect with 5,6,7

False

In [42]:
n_8_9 = set((8,9))
n_8_9

{8, 9}

In [43]:
n_3_7.isdisjoint(n_8_9)

True

In [44]:
sentence = "a quick brown fox jumped over a sleeping dog which is not a normal dog"
words = sentence.split()
words

['a',
 'quick',
 'brown',
 'fox',
 'jumped',
 'over',
 'a',
 'sleeping',
 'dog',
 'which',
 'is',
 'not',
 'a',
 'normal',
 'dog']

In [45]:
unique_words_set = set(words)
unique_words_set

{'a',
 'brown',
 'dog',
 'fox',
 'is',
 'jumped',
 'normal',
 'not',
 'over',
 'quick',
 'sleeping',
 'which'}

In [46]:
unique_words_list = list(unique_words_set)
unique_words_list

['over',
 'dog',
 'sleeping',
 'brown',
 'a',
 'not',
 'normal',
 'which',
 'is',
 'quick',
 'jumped',
 'fox']