# 10.4. Random Sampling in Python

## 10.4.1. Review: Sampling from a Population in a Table

- sample() draws **uniformly** (natural model for chance experiments such as rolling a die) at random **with replacement**. 

In [2]:
import numpy as np
from datascience import *

faces = np.arange(1, 7)
die = Table().with_columns('Face', faces)
die

Face
1
2
3
4
5
6


In [3]:
##### sample the die for 7 times
die.sample(7)

Face
6
2
4
6
5
3
3


In [119]:
die1 = np.random.choice(die.column('Face'))
die1_arr = make_array()
for i in range(1000):
    die1_arr = np.append(die1_arr, die.sample(1)[0][0])

np.median(die1_arr)

3.0

In [123]:
import random
die2 = die.sample(1)[0][0]
die2_arr = make_array()
for i in range(1000):
    die2_arr = np.append(die2_arr, random.choice(die.column('Face')))

np.median(die2_arr)

3.0

- Sometimes it is more natural to sample individuals at random without replacement.
- This is called a simple random sample.
- The argument with_replacement=False allows you to do this.

In [4]:
path_data = '../../data/'
actors = Table.read_table(path_data + 'actors.csv')
actors

Actor,Total Gross,Number of Movies,Average per Movie,#1 Movie,Gross
Harrison Ford,4871.7,41,118.8,Star Wars: The Force Awakens,936.7
Samuel L. Jackson,4772.8,69,69.2,The Avengers,623.4
Morgan Freeman,4468.3,61,73.3,The Dark Knight,534.9
Tom Hanks,4340.8,44,98.7,Toy Story 3,415.0
"Robert Downey, Jr.",3947.3,53,74.5,The Avengers,623.4
Eddie Murphy,3810.4,38,100.3,Shrek 2,441.2
Tom Cruise,3587.2,36,99.6,War of the Worlds,234.3
Johnny Depp,3368.6,45,74.9,Dead Man's Chest,423.3
Michael Caine,3351.5,58,57.8,The Dark Knight,534.9
Scarlett Johansson,3341.2,37,90.3,The Avengers,623.4


In [14]:
##### Simple random sample of 5 rows
actors.sample(5, with_replacement=False)

Actor,Total Gross,Number of Movies,Average per Movie,#1 Movie,Gross
Tommy Lee Jones,2681.3,46,58.3,Men in Black,250.7
Brad Pitt,2680.9,40,67.0,World War Z,202.4
Samuel L. Jackson,4772.8,69,69.2,The Avengers,623.4
Sandra Bullock,2462.6,35,70.4,Minions,336.0
Jim Carrey,2545.2,27,94.3,The Grinch,260.0


## 10.4.2. Review: Sampling from a Population in an Array

In [6]:
##### np.randpm.choice

faces

array([1, 2, 3, 4, 5, 6])

In [7]:
##### 7 rolls of the die
np.random.choice(faces, 7)

array([5, 3, 2, 5, 1, 6, 3])

In [8]:
##### array of actor names

actor_names = actors.column('Actor')

In [9]:
##### simple random sample of 5 actor names

np.random.choice(actor_names, 5, replace=False)

array(['Michael Caine', 'Tom Cruise', 'Robin Williams', 'Gary Oldman',
       'Harrison Ford'],
      dtype='<U22')

## 10.4.3. Sampling from a Categorical Distribution

#### categorical attribute of our sampled individuals

In [10]:
# Species distribution of flower colors:
# Proportions are in the order Red, Pink, White

species_proportions = [0.25, 0.5, .25]
sample_size = 300

# Distribution of sample
sample_distribution = sample_proportions(sample_size, species_proportions)
sample_distribution

array([ 0.27333333,  0.46666667,  0.26      ])

In [11]:
sum(sample_distribution)

1.0

In [12]:
### sample proportions of Heads

sample_distribution.item(1)

0.4666666666666667