# MSDS 400 Sets and Probability

The data for the following example are shown in the cell below. Various set operations will be performed. The first steps are to define the lists of data which will then be converted into set objects. Various set
operations will be demonstrated. Finally, a set will be converted to a list.

In [36]:
# set([1, 2, 3]) is the same as {1, 2, 3}. The former is easier to read, but the later is about twice as fast for the CPU.
U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
A = {1, 2, 4, 5, 7}
B = {2, 4, 5, 7, 9, 11}

The following generates new variables only to facilitate printing.
The variables Uab, AB, Ac, Bc, AsB, R, Add, Addc and SD are for my
convenience only.  The important thing is to study the set operations.

In [37]:
Uab = A | B
print('Union of A and B =', Uab)

Union of A and B = {1, 2, 4, 5, 7, 9, 11}


In [38]:
AB = A & B
print('Intersection of A and B =', AB)

Intersection of A and B = {2, 4, 5, 7}


In [39]:
Ac = U - A
Bc = U - B
print('A complement =', Ac)
print('B complement =', Bc)

A complement = {3, 6, 8, 9, 10, 11}
B complement = {1, 3, 6, 8, 10}


In [40]:
AsB = A ^ B  # Example of finding the symmetric difference of A and b. The symmetric difference are elements in A and B not common to both
print('Symmetric difference of A and B = ', AsB)

Symmetric difference of A and B =  {1, 9, 11}


In [41]:
SD = (A | B) - (A & B)  # Example showing another way to obtain the symmetric difference
print('Symmetric difference by union and intersection =', SD)

Symmetric difference by union and intersection = {1, 11, 9}


In [42]:
R = Ac | Bc | AB  # Union of several sets
print('Union of Ac, Bc and AB  ', R)
print('Original set U was ', U)

Union of Ac, Bc and AB   {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
Original set U was  {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}


In [43]:
Add = {12, 13, 14}  # Items can be added to sets using the union operation
U = U | Add
print('Updated version of U = ', U)

Updated version of U =  {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}


In [44]:
Addc = U - Add  # Removal is possible using the complement operation
U = U & Addc
print('Original version of U =', U)

Original version of U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}


In [45]:
U = list(U)  # Following these operations, a set may be converted to a list.
print('U is now a list.', U)

U is now a list. [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]


Generate the universe U.  Note that range(1,27,1) will be used to produce a list of 26 positive integers for set operations.  Remember, range() includes the first, i.e. 1, but not the last element, i.e. 27.  The third argument, 1, defines the step used between consecutive elements.  Note that the functions used in this module are available from Python itself.

Slice the list U into three subsets.  Note that slicing is inclusive of the first indexed element and exclusive of the last indexed element.  Also, the first element has 0 for an index.

In [46]:
U = range(1, 27)
A = U[0:13]
B = U[13:]
C = U[7:20]

A = set(A)
B = set(B)
C = set(C)
U = set(U)
print('The universe U is', U)

print('A is', A)  # Compare these sets to the slice statements above.
print('B is', B)
print('C is', C)

The universe U is {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26}
A is {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13}
B is {14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26}
C is {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}


It is assumed in this module that each element of U occurs with equal probability as would be the case with random sampling with replacement. This assumption allows the probability of a set to be calculated as the ratio of the length of the set (i.e. number of elements) divided by the overall length of U which in this Module is 26.  Note that %r and %s both can be used with strings and lists.

Convert the length of T to floating point to avoid interger division. The function round() was defined in earlier modules as was float().

In [47]:
T = float(len(U))

Null = A & B  # Demonstrate the null intersection and probability.
print('Probability of null intersection ', round((len(Null)/T), 3))

Probability of null intersection  0.0


In [48]:
print('Intersection of A and C is ', (A & C))  # Demonstrate the Union Rule for Probability.
print('Union of A and C ', (A | C))

P = (len(A) + len(C) - len(A & C))
print('Probability of A union C = ', round((len(A | C)/T), 3))
print('Result of Union Rule Summation =', ((P/T), 3))

Intersection of A and C is  {8, 9, 10, 11, 12, 13}
Union of A and C  {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}
Probability of A union C =  0.769
Result of Union Rule Summation = (0.7692307692307693, 3)


In [49]:
# Demonstrate the calculation of Odds using the complement rule.
print('Complement of C is', (U-C))
print('Odds of C are', round((len(C)/float(len(U-C))), 3))

Complement of C is {1, 2, 3, 4, 5, 6, 7, 21, 22, 23, 24, 25, 26}
Odds of C are 1.0


In [50]:
print('Complement of A intersection C is', U-(A & C))
P = (len(A & C)/float(len(U-(A & C))))
print('Odds of A intersection C are', round(P, 3))

Complement of A intersection C is {1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26}
Odds of A intersection C are 0.3


In [51]:
# Demonstrate conditional probability.
P = round(len(A & C)/float(len(C)), 3)
print('Conditional probability of A given C is', P)

Conditional probability of A given C is 0.462


In [52]:
# Demonstrate the product rule.
print('Probability of A and C intersection is', round((len(A & C)/T), 3))
Q = len(C)/T
print('Probability of C is', round(Q, 3))
print('Product rule result for A and C intersection is', round(P*Q, 3))

Probability of A and C intersection is 0.231
Probability of C is 0.5
Product rule result for A and C intersection is 0.231
