# 9. Randomness

- We've learned plenty about descriptive data, now we will learn about
- Randomness, which is the basis of experimental design

In ch.9., we'll be talking about:
#### 9.1. Conditional Statements (if, if-else, elif, else)
#### 9.2. Iteration (count-controlled vs. condition-controlled)
#### 9.3. Simulation
#### 9.4. The Monty Hall Problem
#### 9.5. Finding Probabilities

In [123]:
### Python has its own random module (random) with plenty of methods.
### https://www.w3schools.com/python/module_random.asp
### The numpy.random library contains a few extra probability distributions 
### commonly used in scientific research, as well as a couple of convenience functions 
### for generating arrays of random data. 

### Python library, package, module, framework ###
### https://learnpython.com/blog/python-modules-packages-libraries-frameworks/
### Module: A module is basically a bunch of related code saved in a file with the extension .py
### Package: Python packages are basically a directory of a collection of modules.
### Library: A library is an umbrella term contains a collection of related modules and packages.

In [124]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "../../images/python-random-module.jpg")

In [134]:
##### Python random.random()
import random
random.random()

0.6569713720121041

In [151]:
#####  numpy.random
# import numpy as np
import numpy
numpy.random.random()

0.2173779029656715

In [152]:
### headers
path_data = '../../../assets/data/'
from datascience import *
import numpy as np
import matplotlib.pyplot as plt
matplotlib.use('Agg')
%matplotlib inline
plt.style.use('fivethirtyeight')

In [72]:
two_groups = make_array('treatment', 'control')
two_groups

array(['treatment', 'control'],
      dtype='<U9')

In [73]:
np.random.choice(two_groups)

'treatment'

In [74]:
np.random.choice(two_groups, 10)

array(['control', 'treatment', 'control', 'control', 'control', 'control',
       'control', 'control', 'treatment', 'treatment'],
      dtype='<U9')

In [78]:
# two_groups = Table().with_columns(
#     'treatment',  make_array( 1, 2, 3 ),
#     'control',    make_array( 4, 5, 6 )
# )
# two_groups   ### try something else

In [157]:
##### how good is random.choice? #####
def repeat(str_arr, num):
    string = []
    for i in range(num):
        string.append(np.random.choice(str_arr))
        
        # print (str(i) + " " + np.random.choice(str_arr))
    return string

lst = repeat(two_groups, 50000)
# lst
# lst.count()
lst.count('treatment')

25007

In [159]:
##### I am a joke. choice() has an argument n to return an array
##### np.random.choice¶
##### np.random.choice(array)
##### np.random.choice(array, n, replace=True)

groups = np.random.choice(two_groups, 100000)
groups

array(['control', 'treatment', 'treatment', ..., 'treatment', 'treatment',
       'control'],
      dtype='<U9')

In [161]:
result = Table().with_column(
    "group", groups
)
result.show(3)

group
control
treatment
treatment


In [162]:
result.group('group')

group,count
control,49928
treatment,50072


A fundamental question about **random** events is whether or not they occur. For example:

- Did an individual get assigned to the treatment group, or not?
- Is a gambler going to win money, or not?
- Has a poll made an accurate prediction, or not?

Once the event has occurred, you can answer “yes” or “no” to all these questions. In programming, it is conventional to do this by labeling statements as True or False. For example, if an individual did get assigned to the treatment group, then the statement, “The individual was assigned to the treatment group” would be `True`. If not, it would be `False`.


## Booleans and Comparison 

Boolean > comparison **operators**

In [79]:
3 > 1 + 1

True

In [80]:
### what would the result of the following operation be? and why?

1 < 1 + 1 < 3

True

The value `True` indicates that the comparison is valid; Python has confirmed this simple fact about the relationship between `3` and `1+1`. The full set of common comparison operators are listed below.

| Comparison         | Operator | True example | False Example |
|--------------------|----------|--------------|---------------|
| Less than          | <        | 2 < 3        | 2 < 2         |
| Greater than       | >        | 3 > 2        | 3 > 3         |
| Less than or equal | <=       | 2 <= 2       | 3 <= 2        |
| Greater or equal   | >=       | 3 >= 3       | 2 >= 3        |
| Equal              | ==       | 3 == 3       | 3 == 2        |
| Not equal          | !=       | 3 != 2       | 2 != 2        |

Notice the two equal signs `==` in the comparison to determine equality. This is necessary because Python already uses `=` to mean assignment to a name, as we have seen. It can't use the same symbol for a different purpose. Thus if you want to check whether 5 is equal to the 10/2, then you have to be careful: `5 = 10/2` returns an error message because Python assumes you are trying to assign the value of the expression 10/2 to a name that is the numeral 5. Instead, you must use `5 == 10/2`, which evaluates to `True`.

In [85]:
5 = 10/2

SyntaxError: cannot assign to literal here. Maybe you meant '==' instead of '='? (1531505505.py, line 1)

In [86]:
5 == 10/2

True

An expression can contain multiple comparisons, and they all must hold in order for the whole expression to be `True`. For example, we can express that `1+1` is between `1` and `3` using the following expression.

In [88]:
1 < 1 + 1 < 3

True

The average of two numbers is always between the smaller number and the larger number. We express this relationship for the numbers `x` and `y` below. You can try different values of `x` and `y` to confirm this relationship.

In [91]:
x = 12
y = 5
min(x, y) <= (x + y)/2 <= max(x, y)

True

In [93]:
### what would the result of the following operation be? and why?
### supposed that x and y are defined

min(x, y) <= (x+y)/2 <= max(x, y)

True

### String Comparison

In [96]:
a

NameError: name 'a' is not defined

In [98]:
"a"

'a'

In [99]:
"a"  "a"

'aa'

In [103]:
"Dog" > "dog"

False

In [104]:
'Dog' > 'Cat'

True

Let's return to random selection. Recall the array `two_groups` which consists of just two elements, `treatment` and `control`. To see whether a randomly assigned individual went to the treatment group, you can use a comparison:

In [105]:
np.random.choice(two_groups) == 'treatment'

True

As before, the random choice will not always be the same, so the result of the comparison won't always be the same either. It will depend on whether `treatment` or `control` was chosen. With any cell that involves random selection, it is a good idea to run the cell several times to get a sense of the variability in the result.

### Comparing an Array and a Value

We have mostly worked on tables and columns (arrays) in this course. 

In [111]:
##### now commpare an array and a value:

tosses = make_array('Tails', 'Heads', 'Tails', 'Heads', 'Heads')
tosses

array(['Tails', 'Heads', 'Tails', 'Heads', 'Heads'],
      dtype='<U5')

In [113]:
##### What would the following operation give us?
tosses == 'Heads'

array([False,  True, False,  True,  True], dtype=bool)

In [116]:
##### The numpy method count_nonzero evaluates to the number of 
##### non-zero (that is, True) elements of the array.

np.count_nonzero(tosses == 'Heads')

3

In [122]:
### in datascience array, 
### what would the result of this operation be?

make_array(0, 5, 2) * 2

array([ 0, 10,  4])