# 10.4. Random Sampling in Python

## 10.4.1. Review: Sampling from a Population in a Table

- sample() draws **uniformly** (natural model for chance experiments such as rolling a die) at random **with replacement**. 

In [5]:
import numpy as np
from datascience import *

faces = np.arange(1, 7)
die = Table().with_columns('Face', faces)
die

Face
1
2
3
4
5
6


In [7]:
##### sample the die for 7 times
die.sample(7)

Face
5
4
2
2
3
3
1


- Sometimes it is more natural to sample individuals at random without replacement.
- This is called a simple random sample.
- The argument with_replacement=False allows you to do this.

In [10]:
path_data = '../data/'
actors = Table.read_table(path_data + 'actors.csv')
actors

Actor,Total Gross,Number of Movies,Average per Movie,#1 Movie,Gross
Harrison Ford,4871.7,41,118.8,Star Wars: The Force Awakens,936.7
Samuel L. Jackson,4772.8,69,69.2,The Avengers,623.4
Morgan Freeman,4468.3,61,73.3,The Dark Knight,534.9
Tom Hanks,4340.8,44,98.7,Toy Story 3,415.0
"Robert Downey, Jr.",3947.3,53,74.5,The Avengers,623.4
Eddie Murphy,3810.4,38,100.3,Shrek 2,441.2
Tom Cruise,3587.2,36,99.6,War of the Worlds,234.3
Johnny Depp,3368.6,45,74.9,Dead Man's Chest,423.3
Michael Caine,3351.5,58,57.8,The Dark Knight,534.9
Scarlett Johansson,3341.2,37,90.3,The Avengers,623.4


In [11]:
##### Simple random sample of 5 rows
actors.sample(5, with_replacement=False)

Actor,Total Gross,Number of Movies,Average per Movie,#1 Movie,Gross
Jonah Hill,2605.1,29,89.8,The LEGO Movie,257.8
Ben Stiller,2827.0,37,76.4,Meet the Fockers,279.3
Matt Damon,3107.3,39,79.7,The Martian,228.4
Stanley Tucci,3123.9,50,62.5,Catching Fire,424.7
Mark Wahlberg,2549.8,36,70.8,Transformers 4,245.4


## 10.4.2. Review: Sampling from a Population in an Array

In [14]:
##### np.randpm.choice

faces

array([1, 2, 3, 4, 5, 6])

In [15]:
##### 7 rolls of the die
np.random.choice(faces, 7)

array([6, 2, 5, 6, 3, 5, 4])

In [16]:
##### array of actor names

actor_names = actors.column('Actor')

In [18]:
##### simple random sample of 5 actor names

np.random.choice(actor_names, 5, replace=False)

array(['Daniel Radcliffe', 'Will Smith', 'Philip Seymour Hoffman',
       'Tommy Lee Jones', 'Johnny Depp'],
      dtype='<U22')

## 10.4.3. Sampling from a Categorical Distribution

#### categorical attribute of our sampled individuals

In [19]:
# Species distribution of flower colors:
# Proportions are in the order Red, Pink, White
species_proportions = [0.25, 0.5, .25]

sample_size = 300

# Distribution of sample
sample_distribution = sample_proportions(sample_size, species_proportions)
sample_distribution

array([ 0.26666667,  0.49333333,  0.24      ])

In [20]:
sum(sample_distribution)

1.0

In [24]:
### sample proportions of Heads

sample_distribution.item(1)

0.49333333333333335