# Introduction to Sets - Lab

## Introduction

Probability theory is all around. A common example is in the game of poker or related card games, where players try to calculate the probability of winning a round given the cards they have in their hands. Also, in a business context, probabilities play an important role. Operating in a volatile economy, companies need to take uncertainty into account and this is exactly where probability theory plays a role.

As mentioned in the lesson before, a good understanding of probability starts with understanding of sets and set operations. That's exactly what you'll learn in this lab!

## Objectives

You will be able to:

* Use Python to perform set operations
* Use Python to demonstrate the inclusion/exclusion principle


## Exploring Set Operations Using a Venn Diagram

Let's start with a pretty conceptual example. Let's consider the following sets:

   - $\Omega$ = positive integers between [1, 12]
   - $A$= even numbers between [1, 10]
   - $B = \{3,8,11,12\}$
   - $C = \{2,3,6,8,9,11\}$
    

#### a. Illustrate all the sets in a Venn Diagram like the one below. The rectangular shape represents the universal set.

<img src="./images/venn_diagr.png" width="600">


#### b. Using your Venn Diagram, list the elements in each of the following sets:

- $ A \cap B$ = {8}      A = 2,4,6,8,10      B = 3,8,11,12   
 
- $ A \cup C$ = {2,3,4,6,8,9,10,11}       A = 2,4,6,8,10    C = 2,3,6,8,9,11

- $A^c$ = {1,3,5,7,9,11,12}      A = 2,4,6,8,10

- The absolute complement of B = {1,2,4,5,6,7,9,10}         B = 3,8,11,12

- $(A \cup B)^c$ = {1,5,7,9}          A = 2,4,6,8,10    B = 3,8,11,12    2,3,4,6,8,11,12

- $B \cap C'$ = {12}         B = 3,8,11,12   C' = 1,4,5,7,10,12

- $A\backslash B$ = {2,4,6,10}    A = 2,4,6,8,10    B = 3,8,11,12 

- $C \backslash (B \backslash A)$ = {2,6,8,9} A = 2,4,6,8,10  B = 3,8,11,12 C = 2,3,6,8,9,11 

- $(C \cap A) \cup (C \backslash B)$ = {2,6,8,9}   2,6,8   2,6,9


        
        
#### c. For the remainder of this exercise, let's  create sets A, B and C and universal set U in Python and test out the results you came up with. Sets are easy to create in Python. For a guide to the syntax, follow some of the documentation [here](https://www.w3schools.com/python/python_sets.asp)

In [1]:
# Create set A
A = {2, 4, 6, 8, 10}
'Type A: {}, A: {}'.format(type(A), A) # "Type A: <class 'set'>, A: {2, 4, 6, 8, 10}"

"Type A: <class 'set'>, A: {2, 4, 6, 8, 10}"

In [2]:
# Create set B
B = {8, 11, 3, 12}
'Type B: {}, B: {}'.format(type(B), B) # "Type B: <class 'set'>, B: {8, 11, 3, 12}"

"Type B: <class 'set'>, B: {8, 3, 11, 12}"

In [3]:
# Create set C
C = {2, 3, 6, 8, 9, 11}
'Type C: {}, C: {}'.format(type(C), C) # "Type C: <class 'set'>, C: {2, 3, 6, 8, 9, 11}"

"Type C: <class 'set'>, C: {2, 3, 6, 8, 9, 11}"

In [4]:
# Create universal set U
U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
'Type U: {}, U: {}'.format(type(U), U) # "Type U: <class 'set'>, U: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}"

"Type U: <class 'set'>, U: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}"

Now, verify your answers in section 1 by using the correct methods in Python. To provide a little bit of help, you can find a table with common operations on sets below.

| Method        |	Equivalent |	Result |
| ------                    | ------       | ------    |
| s.issubset(t)             |	s <= t     | test whether every element in s is in t
| s.issuperset(t)           |	s >= t     | test whether every element in t is in s
| s.union(t)                |	s $\mid$ t | new set with elements from both s and t
| s.intersection(t)         |	s & t      | new set with elements common to s and t
| s.difference(t)           |	s - t 	   | new set with elements in s but not in t
| s.symmetric_difference(t) |	s ^ t      | new set with elements in either s or t but not both

#### 1. $ A \cap B$

In [5]:
A_inters_B =  A&B
A_inters_B # {8}

{8}

#### 2. $ A \cup C $

In [6]:
A_union_C = A|C
A_union_C # {2, 3, 4, 6, 8, 9, 10, 11}

{2, 3, 4, 6, 8, 9, 10, 11}

#### 3.  $A^c$ (you'll have to be a little creative here!)

In [7]:
A_comp = U-A
A_comp # {1, 3, 5, 7, 9, 11, 12}

{1, 3, 5, 7, 9, 11, 12}

#### 4.  $(A \cup B)^c $

In [8]:
A_union_B_comp = U-(A|B)
A_union_B_comp # {1, 5, 7, 9}

{1, 5, 7, 9}

#### 5. $B \cap C' $

In [9]:
B_inters_C_comp = B&(U-C)
B_inters_C_comp # {12}

{12}

#### 6. $A\backslash B$

In [10]:
compl_of_B = A&(A-B)
compl_of_B # {2, 4, 6, 10}

{2, 4, 6, 10}

#### 7. $C \backslash (B \backslash A) $

In [11]:
C_compl_B_compl_A = C&(C-(B&(B-A)))
C_compl_B_compl_A # {2, 6, 8, 9}

{2, 6, 8, 9}

#### 8.  $(C \cap A) \cup (C \backslash B)$

In [12]:
C_inters_A_union_C_min_B= (C&A)|(C&(C-B))
C_inters_A_union_C_min_B # {2, 6, 8, 9}

{2, 6, 8, 9}

## The Inclusion Exclusion Principle

Use A, B and C from exercise one to verify the inclusion exclusion principle in Python. 
You can use the sets A, B and C as used in the previous exercise. 

Recall from the previous lesson that:

$$\mid A \cup B\cup C\mid = \mid A \mid + \mid B \mid + \mid C \mid - \mid A \cap B \mid  -\mid A \cap C \mid - \mid B \cap C \mid  + \mid A \cap B \cap C \mid $$

Combining these main commands:

| Method        |	Equivalent |	Result |
| ------                    | ------       | ------    |
| a.union(b)                |	A $\mid$ B | new set with elements from both a and b
| a.intersection(b)         |	A & B      | new set with elements common to a and b

along with the `len(x)` function to get to the cardinality of a given x ("|x|").

What you'll do is translate the left hand side of the equation for the inclusion principle in the object `left_hand_eq`, and the right hand side in the object `right_hand_eq` and see if the results are the same.


In [22]:
left_hand_eq = len((A.union(B)).union(C))
print(left_hand_eq)  # 9 elements in the set

9


In [23]:
right_hand_eq = len(A)+len(B)+len(C)-len(A.intersection(B))-len(A&C)-len(B&C)+len(A&B&C)
print(right_hand_eq) # 9 elements in the set

9


In [24]:
left_hand_eq == right_hand_eq # Use a comparison operator to compare `left_hand_eq` and `right_hand_eq`. Needs to say "True".

True

## Set Operations in Python

Mary is preparing for a road trip from her hometown, Boston, to Chicago. She has quite a few pets, yet luckily, so do her friends. They try to make sure that they take care of each other's pets while someone is away on a trip. A month ago, each respective person's pet collection was given by the following three sets:

In [39]:
Nina = set(["Cat","Dog","Rabbit","Donkey","Parrot", "Goldfish"])
Mary = set(["Dog","Chinchilla","Horse", "Chicken"])
Eve = set(["Rabbit", "Turtle", "Goldfish"])

In this exercise, you'll be able to use the following operations:

|Operation                          |	Equivalent |	Result|
| ------                            | ------       | ------   |
|s.update(t)                        | 	$s \mid t$ 	   |return set s with elements added from t|
|s.intersection_update(t)           | 	s &= t     |	return set s keeping only elements also found in t|
|s.difference_update(t)             |	s -= t 	   |return set s after removing elements found in t|
|s.symmetric_difference_update(t)   |	s ^= t 	   |return set s with elements from s or t but not both|
|s.add(x)                           |	           |	add element x to set s|
|s.remove(x)                        |	           |	remove x from set s|
|s.discard(x)                       |	           |	removes x from set s if present|
|s.pop()                            | 	           |	remove and return an arbitrary element from s|
|s.clear()            	            |  	           |remove all elements from set s|

Sadly, Eve's turtle passed away last week. Let's update her pet list accordingly.

In [40]:
Eve.remove("Turtle")
Eve # should be {'Rabbit', 'Goldfish'}

{'Goldfish', 'Rabbit'}

This time around, Nina promised to take care of Mary's pets while she's away. But she also wants to make sure her pets are well taken care of. As Nina is already spending a considerable amount of time taking care of her own pets, adding a few more won't make that much of a difference. Nina does want to update her list while Marie is away. 

In [41]:
Nina.update(Mary)
Nina # {'Chicken', 'Horse', 'Chinchilla', 'Parrot', 'Rabbit', 'Donkey', 'Dog', 'Cat', 'Goldfish'}

{'Cat',
 'Chicken',
 'Chinchilla',
 'Dog',
 'Donkey',
 'Goldfish',
 'Horse',
 'Parrot',
 'Rabbit'}

Mary, on the other hand, wants to clear her list altogether while away:

In [42]:
Mary.clear()
Mary  # set()

set()

Look at how many species Nina is taking care of right now.

In [43]:
n_species_Nina = len(Nina)
n_species_Nina # 9

9

In [44]:
Nina

{'Cat',
 'Chicken',
 'Chinchilla',
 'Dog',
 'Donkey',
 'Goldfish',
 'Horse',
 'Parrot',
 'Rabbit'}

Taking care of this many pets is weighing heavily on Nina. She remembered Eve had a smaller collection of pets lately, and that's why she asks Eve to take care of the common species. This way, the extra pets are not a huge effort on Eve's behalf. Let's update Nina's pet collection.

In [45]:
Nina.difference_update(Eve)
Nina # 7

{'Cat', 'Chicken', 'Chinchilla', 'Dog', 'Donkey', 'Horse', 'Parrot'}

Taking care of 7 species is something Nina feels comfortable doing!

In [38]:
Eve

{'Goldfish', 'Rabbit'}

## Writing Down the Elements in a Set


Mary dropped off her Pet's at Nina's house and finally made her way to the highway. Awesome, her vacation has begun!
She's approaching an exit. At the end of this particular highway exit, cars can either turn left (L), go straight (S) or turn right (R). It's pretty busy and there are two cars driving close to her. What you'll do now is create several sets. You won't be using Python here, it's sufficient to write the sets down on paper. A good notion of sets and subsets will help you calculate probabilities in the next lab!

Note: each set of action is what _all three cars_ are doing at any given time

a. Create a set $A$ of all possible outcomes assuming that all three cars drive in the same direction.
           
b. Create a set $B$ of all possible outcomes assuming that all three cars drive in a different direction.
             
c. Create a set $C$ of all possible outcomes assuming that exactly 2 cars turn right.
            
d. Create a set $D$ of all possible outcomes assuming that exactly 2 cars drive in the same direction.

                          
e. Write down the interpretation and give all possible outcomes for the sets denoted by:
 - I. $D'$ 
 - II. $C \cap D$, 
 - III. $C \cup D$. 

### My answers:
A = {right, right, right} or {left, left, left} or {straight, straight, straight}

B = {straight, left, right}

C = {right, right, left} or {right, right, straight}

D = {right, right, left}, {left, left, right}, {straight, straight, left},
    {right, right, straight}, {left, left, straight}, {straight, straight, right}

D' = {straight, left, right}, or {right, right, right} or {left, left, left} or 
     {straight,   straight, straight}

C intersect D = {right, right, left} or {right, right, straight}

C union D = {right, right, left}, {left, left, right}, {straight, straight, left},
            {right, right, straight}, {left, left, straight}, {straight, straight, right}


## Optional Exercise: European Countries

Use set operations to determine which European countries are not in the European Union. You just might have to clean the data first with pandas.

In [46]:
import pandas as pd

#Load Europe and EU
europe = pd.read_excel('Europe_and_EU.xlsx', sheet_name = 'Europe') 
eu = pd.read_excel('Europe_and_EU.xlsx', sheet_name = 'EU')

#Use pandas to remove any whitespace from names

In [50]:
#preview dataframe
eu.head()

Unnamed: 0,Rank,Country,2017 population,% of pop.,Average relative annual growth,Average absolute annual growth,Official figure,Date of last figure,Source
0,1,Germany,82800000,16.18,0.76,628876,82576900,2017-03-31,Official estimate
1,2,France,67210459,13.1,0.4,265557,67174000,2018-01-01,Monthly official estimate
2,3,United Kingdom,65808573,12.86,0.65,428793,65648100,2017-06-30,Official estimate
3,4,Italy,60589445,11.84,-0.13,-76001,60494000,2018-01-01,Monthly official estimate
4,5,Spain,46528966,9.09,0.19,89037,46549045,2017-07-01,Official estimate


In [53]:
print(len(eu.Country.unique()))
eu.Country.unique()

28


array([' Germany ', 'France', 'United Kingdom', ' Italy ', ' Spain ',
       ' Poland ', ' Romania ', ' Netherlands ', ' Belgium ', ' Greece ',
       ' Czech Republic ', ' Portugal ', ' Sweden ', ' Hungary ',
       ' Austria ', ' Bulgaria ', ' Denmark ', ' Finland ', ' Slovakia ',
       ' Ireland ', ' Croatia ', ' Lithuania ', ' Slovenia ', ' Latvia ',
       ' Estonia ', ' Cyprus ', ' Luxembourg ', ' Malta '], dtype=object)

In [49]:
europe.head() 

Unnamed: 0,Rank,Country,Population,% of population,Average relative annual growth (%),Average absolute annual growth,Estimated doubling time (Years),Official figure (where available),Date of last figure,Regional grouping,Source
0,1.0,Russia,143964709,17.15,0.19,294285,368,146839993,2017-01-01 00:00:00,EAEU,[1]
1,2.0,Germany,82521653,9.8,1.2,600000,90,82800000,2016-12-31 00:00:00,EU,Official estimate
2,3.0,Turkey,80810000,9.6,1.34,1035000,52,77695904,2016-12-31 00:00:00,,[2]
3,4.0,France,65233271,7.76,0.39,261022,177,66991000,2017-03-01 00:00:00,EU,[3]
4,5.0,United Kingdom,65110276,7.75,0.75,484000,92,66573504,2016-12-30 00:00:00,EU,Official estimate


In [54]:
print(len(europe.Country.unique()))
europe.Country.unique()

57


array([' Russia', ' Germany ', ' Turkey ', 'France', 'United Kingdom',
       ' Italy ', ' Spain ', ' Ukraine', ' Poland ', ' Romania ',
       ' Netherlands ', ' Belgium ', ' Greece ', ' Czech Republic ',
       ' Portugal ', ' Sweden ', ' Hungary ', ' Azerbaijan ', ' Belarus ',
       ' Serbia ', ' Austria ', '  Switzerland ', ' Bulgaria ',
       ' Denmark ', ' Finland ', ' Slovakia ', ' Norway ', ' Ireland ',
       ' Croatia ', ' Bosnia and Herzegovina ', ' Georgia ', ' Moldova ',
       ' Armenia ', ' Lithuania ', ' Albania ', ' Macedonia ',
       ' Slovenia ', ' Latvia ', ' Kosovo ', ' Estonia ', ' Cyprus ',
       ' Montenegro ', ' Luxembourg ', ' Malta ', ' Iceland ',
       ' Jersey (UK) ', ' Isle of Man (UK) ', ' Andorra ',
       ' Guernsey (UK) ', ' Faroe Islands (Denmark) ', ' Liechtenstein ',
       ' Monaco ', ' Gibraltar (UK)', ' San Marino ',
       ' Åland Islands (Finland) ', ' Svalbard (Norway) ',
       '  Vatican City '], dtype=object)

In [60]:
# Your code comes here
europe.Country = europe.Country.str.replace(' ', '')
europe.Country

0                    Russia
1                   Germany
2                    Turkey
3                    France
4             UnitedKingdom
5                     Italy
6                     Spain
7                   Ukraine
8                    Poland
9                   Romania
10              Netherlands
11                  Belgium
12                   Greece
13            CzechRepublic
14                 Portugal
15                   Sweden
16                  Hungary
17               Azerbaijan
18                  Belarus
19                   Serbia
20                  Austria
21              Switzerland
22                 Bulgaria
23                  Denmark
24                  Finland
25                 Slovakia
26                   Norway
27                  Ireland
28                  Croatia
29     BosniaandHerzegovina
30                  Georgia
31                  Moldova
32                  Armenia
33                Lithuania
34                  Albania
35                Ma

In [69]:
EURO = set(europe.Country)
EURO

{'Albania',
 'Andorra',
 'Armenia',
 'Austria',
 'Azerbaijan',
 'Belarus',
 'Belgium',
 'BosniaandHerzegovina',
 'Bulgaria',
 'Croatia',
 'Cyprus',
 'CzechRepublic',
 'Denmark',
 'Estonia',
 'FaroeIslands(Denmark)',
 'Finland',
 'France',
 'Georgia',
 'Germany',
 'Gibraltar(UK)',
 'Greece',
 'Guernsey(UK)',
 'Hungary',
 'Iceland',
 'Ireland',
 'IsleofMan(UK)',
 'Italy',
 'Jersey(UK)',
 'Kosovo',
 'Latvia',
 'Liechtenstein',
 'Lithuania',
 'Luxembourg',
 'Macedonia',
 'Malta',
 'Moldova',
 'Monaco',
 'Montenegro',
 'Netherlands',
 'Norway',
 'Poland',
 'Portugal',
 'Romania',
 'Russia',
 'SanMarino',
 'Serbia',
 'Slovakia',
 'Slovenia',
 'Spain',
 'Svalbard(Norway)',
 'Sweden',
 'Switzerland',
 'Turkey',
 'Ukraine',
 'UnitedKingdom',
 'VaticanCity',
 'ÅlandIslands(Finland)'}

In [56]:
eu.Country

0             Germany 
1               France
2       United Kingdom
3               Italy 
4               Spain 
5              Poland 
6             Romania 
7         Netherlands 
8             Belgium 
9              Greece 
10     Czech Republic 
11           Portugal 
12             Sweden 
13            Hungary 
14            Austria 
15           Bulgaria 
16            Denmark 
17            Finland 
18           Slovakia 
19            Ireland 
20            Croatia 
21          Lithuania 
22           Slovenia 
23             Latvia 
24            Estonia 
25             Cyprus 
26         Luxembourg 
27              Malta 
Name: Country, dtype: object

In [59]:
eu.Country = eu.Country.str.replace(' ', '')
eu.Country

0           Germany
1            France
2     UnitedKingdom
3             Italy
4             Spain
5            Poland
6           Romania
7       Netherlands
8           Belgium
9            Greece
10    CzechRepublic
11         Portugal
12           Sweden
13          Hungary
14          Austria
15         Bulgaria
16          Denmark
17          Finland
18         Slovakia
19          Ireland
20          Croatia
21        Lithuania
22         Slovenia
23           Latvia
24          Estonia
25           Cyprus
26       Luxembourg
27            Malta
Name: Country, dtype: object

In [68]:
EU = set(eu.Country)
EU

{'Austria',
 'Belgium',
 'Bulgaria',
 'Croatia',
 'Cyprus',
 'CzechRepublic',
 'Denmark',
 'Estonia',
 'Finland',
 'France',
 'Germany',
 'Greece',
 'Hungary',
 'Ireland',
 'Italy',
 'Latvia',
 'Lithuania',
 'Luxembourg',
 'Malta',
 'Netherlands',
 'Poland',
 'Portugal',
 'Romania',
 'Slovakia',
 'Slovenia',
 'Spain',
 'Sweden',
 'UnitedKingdom'}

In [76]:
Non_EU_Countries = (EURO).difference_update(EU)
type(Non_EU_Countries)

NoneType

In [75]:
europe['Non_EU_Countries'] = (EURO).difference_update(EU)
europe


Unnamed: 0,Rank,Country,Population,% of population,Average relative annual growth (%),Average absolute annual growth,Estimated doubling time (Years),Official figure (where available),Date of last figure,Regional grouping,Source,Non_EU_Countries
0,1.0,Russia,143964709,17.15,0.19,294285,368,146839993,2017-01-01 00:00:00,EAEU,[1],
1,2.0,Germany,82521653,9.8,1.2,600000,90,82800000,2016-12-31 00:00:00,EU,Official estimate,
2,3.0,Turkey,80810000,9.6,1.34,1035000,52,77695904,2016-12-31 00:00:00,,[2],
3,4.0,France,65233271,7.76,0.39,261022,177,66991000,2017-03-01 00:00:00,EU,[3],
4,5.0,UnitedKingdom,65110276,7.75,0.75,484000,92,66573504,2016-12-30 00:00:00,EU,Official estimate,
5,6.0,Italy,59320118,7.21,0.49,298000,141,60788845,2017-01-01 00:00:00,EU,Monthly official estimate,
6,7.0,Spain,46397452,5.53,-0.06,-28000,-,46449565,2015-01-01 00:00:00,EU,Official estimate,
7,8.0,Ukraine,42895704,5.1,-0.32,-136000,-,42726067,2015-03-01 00:00:00,,Monthly official estimate,
8,9.0,Poland,38104832,4.58,0.05,20000,1334,38484000,2014-12-31 00:00:00,EU,Official estimate,
9,10.0,Romania,19622000,2.36,-0.41,-81000,-,19942642,2014-01-01 00:00:00,EU,Official estimate,


## Summary

In this lab, you practiced your knowledge on sets, such as common set operations, the use of Venn Diagrams, the inclusion exclusion principle, and how to use sets in Python! 