In [124]:
# import the necessary packages
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# Basics

## Indexing

In python, when an object is a collection (like a list, a tuple, or a string...etc), you can specify which items you want from that collection by putting the *index* (or multiple indices) of the item in square brackets.

In [125]:
my_list = [4,8,15,16,23,42]

# grab 15 from the list above

my_list[2] # remember indexing begins at 0

15

In [126]:
my_string = "Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated."

# grab "better" from my_string

my_string[13:19]

'better'

In [127]:
my_tuple = (50,100,25)

# grab 50 from my_tuple

my_tuple[0]

50

Dictionaries, because they're not ordered, rely on key-value pairs rather than numeric indices. To grab a value from a dictionary, specify the key related to the value you'd like.

In [128]:
my_dict = {"Happy": True, "Age": 45, "Name": "Janice", "Children": ["Ted", "Alan", "Jake"]}

# grab name of 3rd child, Jake, from my_dict

my_dict["Children"][2] # from value associated with the key Children, grab the item @ the 2nd index

'Jake'

## Your Turn

From the object `CPSC392`, grab the average quiz grade for Carla.
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" width = 200px />

In [129]:
CPSC392 = {"Jimmy": {"QuizGrades": [9,5,9,8,7], "Major": "Computer Science"},
          "Brenda": {"QuizGrades": [9,10,9,10,7], "Major": "Business"},
          "Jacqueline": {"QuizGrades": [3,6,2,8,9], "Major": "Computer Science"},
          "Bethany": {"QuizGrades": [2,2,0,4,5], "Major": "Business"},
          "Kristen": {"QuizGrades": [9,7,9,9,9], "Major": "Computer Science"},
          "Elissa": {"QuizGrades": [4,4,5,8,2], "Major": "Foreign Languages"},
          "Carly": {"QuizGrades": [7,6,8,7,9], "Major": "Biology"}}

###

## Range

You can use the `range()` function in order to get a sequence of integers. Range takes 3 arguments:

- `start`: which number to start at
- `stop`: which number to stop at (this number itself is not included)
- `step`: what incrememnt to step by (default is 1, `step = 2` would give you every other integer)

In [130]:
r = range(0,100,2)

r_list = list(r) # just to print it out

print(r_list)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98]


In [131]:
s = range(10, 24)

s_list = list(s)

print(s_list)

[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]


# Functions

If you need to review: [Video 1](https://www.youtube.com/watch?v=dU5nyLfq7J0), [Video 2](https://www.youtube.com/watch?v=vp50OR-gwTY)

Functions are a way to write multi-use code that can be called over and over in different circumstances. Functions are groups/suites of code that perform a specific calculation given variable inputs. Functions allow us to change the inputs we are operating on through *arguments* which are given to the function when you define and call the function. For example, in the code below which defines the function `square()`, the argument `n` tells us which number to square:

In [132]:
def square(n):
    return(n*n)

## Arguments
The argument `n` is a variable which will hold whatever number we give the function when calling it. For example, if I want to square the number 7 I would use this code:

In [133]:
square(7)

49

in this case, `n` will be equal to `7`, and every time we reference `n` in the function code, we will use `7`. We could also use other numbers. For example:

In [134]:
square(n = 12)

144

Notice that you can explicitly tell python which argument you're setting when calling the function by saying the name of the argument (here: `n`) `=` the value for that argument (here: `12`). 


## Default Arguments
When *defining* a function, you can set *default arguments*, which are values that the function can use if the user does not provide values for that specific argument. For example, in our `square()` function, we can specify that if the user does not provide a value for `n`, then the function should use `n = 1`. We set default arguments by giving the argument a value when *defining* the function.

In [135]:
def square(n = 1):
    return(n*n)

When defining functions, arguments with NO default must come before arguments with a default. For example in this function that multiplies two numbers, `a` and `b`, the argument `a` must appear first in the parentheses when defining `mult()` because it does NOT have a default value, while `b` does.

In [136]:
def mult(a, b = 2):
    return(a * b)

Default arguments allow users to call the function without spefifying a value for that argument. For example if we called `square()` as defined in this section *without* an argument, it would return `1`, because it uses the default value of 1 when no value for `n` is given.

## Calling vs. Defining Functions

When you define a function, you're giving python instructions about what to do *if* you ever call that function. This is why when you write your `def` statement, and run the code, nothing actually outputs. This is because when you run a function definition, you are just asking python to *store* the directions for later. To actually execute the function, you need to *call* it. 

For example when you run the below cell. Nothing will output.

In [137]:
# FUNCTION DEFINITION

def censorDang(sentence):
    # this function takes in a sentence as a string and returns the same
    # sentence censored for any occurance of the word "dang".
    
    sentence_list = sentence.split()
    
    for i in range(0,len(sentence_list)):
        if sentence_list[i].lower() == "dang":
            sentence_list[i] = "****"
            
    return(" ".join(sentence_list))

But when you call the function in the following cell, python will actually execute the code and there will be an output.

In [138]:
## FUNCTION CALL

censorDang("Dang this is good tea.")

'**** this is good tea.'

Remember that when you're writing a function, the arguments are just placeholders or variables. They don't refer to specific objects/values, they're meant to be malleable. So for example, when I wrote `censorDang()`, I didn't write it with a *specific* sentence in mind. The function should work for any sentence!

## Your Turn

Write a function, `max_list()` that takes in a list of numbers as an argument, and returns the maximum value of that list.

<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" width = 200px />

In [139]:
###

Write a function, `q_finder()` that takes in a list of words/strings as an argument, and returns a list of only the words that contain q (upper OR lower case).

<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" width = 200px />

In [140]:
###

Write a function `cluster_subsetter()` that takes in a data frame (see example below), `df`,  and a string `cluster`, as arguments, and returns a data frame with only the rows who are in the cluster specified by `cluster`.

<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" width = 200px />

In [141]:
d = pd.DataFrame({"x": np.random.uniform(0,1,size = 100),
                 "y" : np.random.uniform(0,1,size = 100),
                 "cluster" : np.repeat(["A", "B", "C"], [50,20,30]) })
d
# cluster_subsetter(d,"A") should return only the rows in d that belong in cluster "A"


###

Unnamed: 0,x,y,cluster
0,0.823308,0.779586,A
1,0.775427,0.082398,A
2,0.559344,0.060296,A
3,0.043983,0.153597,A
4,0.415714,0.873494,A
...,...,...,...
95,0.581292,0.828014,C
96,0.835584,0.668196,C
97,0.322225,0.463567,C
98,0.018316,0.976173,C


# List Comprehension

## List Comp Intro

List comprehension is a way to create lists using iteration, essentially as an alternative to using a for loop. 

If a for loop looks like this:

In [142]:
my_list = []

for i in range(0,10):
    my_list.append(i**2)

then the list comprehension would look like this:

In [143]:
my_list2 = [i**2 for i in range(0,10)]

## Other examples:

### making a list of all lower case letters in a string

In [144]:
# for loop
censored_list = []

s = "The rain in Spain falls mainly in the plains"

for letter in s:
    censored_list.append(letter.lower())

In [145]:
# list comp

censored_list2 = [letter.lower() for letter in s]

censored_list == censored_list2

True

### calculating factorials for a bunch of numbers


In [146]:
# for loop

factorials = []

def factorial(n):
    mult = range(1,n+1)
    p = 1
    for i in mult:
        p = p * i
    return(p)
        
n = [2,5,6,10,144]
fac = []
for num in n:
    fac.append(factorial(num))

In [147]:
# list comp

fac2 = [factorial(num) for num in n]

fac == fac2

True

### multiplying all possible combos of items from two lists together
You can even do combine 2 for loops into 1 list comprehension!



In [148]:
# for loop

a = [1,2,3,4,5]
b = [6,7,3,4,-1]

mults = []

for i in a:
    for j in b:
       mults.append(i*j) 

In [149]:
# list comp

mult2 = [i*j for i in a for j in b]

mult == mult2

False

### flattening a list of lists into a single list


In [150]:
# for loop

a = [[1,2,3,4,42,4,3,2],[4,2,3,9,5,83,8,2,9,0,3], [4,8,15,16,23,42]]

newList = []
for sub in a:
    for i in sub:
        newList.append(i)

In [151]:
# list comp
newList2 = [i for sub in a for i in sub]

newList == newList2

True

### making a list of all possible playing cards

In [152]:
suits = ["Hearts", "Spades", "Diamond", "Clubs"]
cards = ["A","K","Q","J", "10", "9", "8", "7", "6", "5", "4", "3", "2"]

deck = []
for suit in suits:
    for card in cards:
        deck.append(suit+card)
        
        
#list comp

deck2 = [card + " of " + suit for suit in suits for card in cards]

print(deck2)

deck == deck2

['A of Hearts', 'K of Hearts', 'Q of Hearts', 'J of Hearts', '10 of Hearts', '9 of Hearts', '8 of Hearts', '7 of Hearts', '6 of Hearts', '5 of Hearts', '4 of Hearts', '3 of Hearts', '2 of Hearts', 'A of Spades', 'K of Spades', 'Q of Spades', 'J of Spades', '10 of Spades', '9 of Spades', '8 of Spades', '7 of Spades', '6 of Spades', '5 of Spades', '4 of Spades', '3 of Spades', '2 of Spades', 'A of Diamond', 'K of Diamond', 'Q of Diamond', 'J of Diamond', '10 of Diamond', '9 of Diamond', '8 of Diamond', '7 of Diamond', '6 of Diamond', '5 of Diamond', '4 of Diamond', '3 of Diamond', '2 of Diamond', 'A of Clubs', 'K of Clubs', 'Q of Clubs', 'J of Clubs', '10 of Clubs', '9 of Clubs', '8 of Clubs', '7 of Clubs', '6 of Clubs', '5 of Clubs', '4 of Clubs', '3 of Clubs', '2 of Clubs']


False

### Words with e's

You can also include boolean statements like if/else in your list comprehension.

In [153]:
words = ["Hello", "Mother", "hello", "father", "fleas", "ticks", "mosquitos", "really", "bother"]

es = []

for word in words:
    if "e" in word.lower():
        es.append(word)

In [154]:
es2 = [word for word in words if "e" in word.lower()]

es == es2

True

### Using list comp with sklearn

You can use list comprehension with all sorts of functions. For example, you can use it to create a bunch of KMeans models and calculate their silhouette scores.

In [155]:
data = pd.read_csv("https://raw.githubusercontent.com/cmparlettpelleriti/CPSC392ParlettPelleriti/master/Data/programmers2.csv")

kmod = KMeans()

kms = [KMeans(n_clusters = k).fit(data) for k in range(2,10)]

silhouettes = [silhouette_score(data,model.predict(data)) for model in kms]

# silhouettes = [silhouette_score(data,KMeans(n_clusters = k).fit_predict(data)) for k in range(2,10)]

print(silhouettes)

print("\nThe maximum silhouette score is: ", max(silhouettes))

[0.4342031414710615, 0.5368423138393504, 0.6513061716958818, 0.616849785320789, 0.5466131448314617, 0.5049064016625989, 0.45263839701273045, 0.35389348907536333]

The maximum silhouette score is:  0.6513061716958818


## Your Turn

use the `prime()` function below, as well as list comprehension to create a list called `primes` that contains all the prime numbers between 3 and 1000.
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" width = 200px />

In [None]:
def prime(n = 10):
    if n < 2:
        return(False)
    if n == 2:
        return(True)
    for div in range(2,n):
        if n%div == 0:
            return(False)
    return(True)

primes = ###


Use list comprehension to turn this list of numbers, into a list of strings (for example if the list is [1,2,3] you want to return ["1", "2", "3"]).
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" width = 200px />

In [None]:
nums = [1,1,3,5,8,13]
string_nums = ###

Use list comprehension to create a list of ONLY words from `sentence` that have an even number of letters.
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" width = 200px />

In [None]:
sentence = "Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur Excepteur sint occaecat cupidatat non proident sunt in culpa qui officia deserunt mollit anim id est laborum"

even_words = ###


# Pandas Data Frames

Data frames are a way to store data in python. It's similar to a single spreadsheet which contains rows (observations) and columns (features).

You can grab the size of the data frame with `.shape` which gives you the number of rows, and columns in the data frame.

In [157]:
data = pd.read_csv("https://raw.githubusercontent.com/cmparlettpelleriti/CPSC392ParlettPelleriti/master/Data/KMEM4.csv")

data.shape

(600, 2)

To see a small # of rows of your data frame, use `.head()`

In [158]:
data.head()

Unnamed: 0,x,y
0,-0.006848,0.395527
1,0.31482,-0.289261
2,0.171705,1.078077
3,-1.203661,1.325926
4,-0.179379,-0.036615


To grab columns from a data frame you can use multiple ways:

In [159]:
data.x

0     -0.006848
1      0.314820
2      0.171705
3     -1.203661
4     -0.179379
         ...   
595    4.189792
596   -4.249038
597    3.411690
598    3.629434
599   -4.904774
Name: x, Length: 600, dtype: float64

In [160]:
data["x"]

0     -0.006848
1      0.314820
2      0.171705
3     -1.203661
4     -0.179379
         ...   
595    4.189792
596   -4.249038
597    3.411690
598    3.629434
599   -4.904774
Name: x, Length: 600, dtype: float64

In [161]:
data.iloc[:,0]

0     -0.006848
1      0.314820
2      0.171705
3     -1.203661
4     -0.179379
         ...   
595    4.189792
596   -4.249038
597    3.411690
598    3.629434
599   -4.904774
Name: x, Length: 600, dtype: float64

In [162]:
data.loc[:, "x"]

0     -0.006848
1      0.314820
2      0.171705
3     -1.203661
4     -0.179379
         ...   
595    4.189792
596   -4.249038
597    3.411690
598    3.629434
599   -4.904774
Name: x, Length: 600, dtype: float64

To acces rows from a data frame, we'll often use `.loc[]` or `.iloc[]`. You can remember the difference by telling yourself that the `i` in `.iloc[]` stands for integer/index, because `.iloc[]` takes indices/integers, whereas `.loc[]` can take booleans, and labels/strings.

Let's use `.loc[]` to grab rows 19-25 (assuming first row is 0) from `data`.

In [163]:
data.iloc[19:26,] # remember it STARTS from the first number and goes up to BUT NOT INCLUDING the second

Unnamed: 0,x,y
19,-0.746738,-0.440733
20,1.524822,-1.256318
21,-0.728119,-0.01501
22,0.914411,-1.748031
23,-0.491472,1.115153
24,0.597467,-1.437617
25,-0.875947,0.58245


Now let's grab only the rows where x > 3, and y > 3.

In [164]:
gt3 = (data.x > 3) & (data.y > 3)

data.loc[gt3]

Unnamed: 0,x,y
128,3.457686,3.200131
140,3.40891,3.062241
201,3.067726,3.22108
272,3.711039,3.285028
313,3.141992,3.761917
361,3.706717,3.030354
422,3.486926,3.055562
436,3.00248,3.544937
471,3.01258,3.290013
554,3.275193,3.012374


There are TONS of useful data frame functions, so I'll demonstrate just a few:


In [165]:
# grab the mean of columns

data.mean()

x   -0.125429
y    0.695387
dtype: float64

In [166]:
# grab the max of the columns

data.max()

x    4.976464
y    4.979621
dtype: float64

In [167]:
# what columns are in my df?

data.columns

Index(['x', 'y'], dtype='object')

In [168]:
# drop missing values

data = data.dropna()

In [169]:
# groupb data frame by cluster assignment and then get the mean for each cluster

prog = pd.read_csv("https://raw.githubusercontent.com/cmparlettpelleriti/CPSC392ParlettPelleriti/master/Data/programmers.csv")
prog["Assignment"] = np.repeat(["A", "B", "C", "D", "E"],50)

prog.groupby("Assignment").mean() # rows are each cluster, columns represent the different features

Unnamed: 0_level_0,py,r,c,sql,js
Assignment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A,89.750298,75.70491,18.941242,24.833986,5.055579
B,69.799266,69.569211,10.8919,64.164052,9.186321
C,96.017893,21.015196,87.05753,22.728009,31.075785
D,18.554134,22.925694,29.934704,33.196274,18.982596
E,63.830289,20.361404,14.394041,87.165838,92.782235


## Your Turn

Using the pandas skills you've learned in class and reviewed here, what is the mean dancibility for each artist in the `popDivas` dataset? Who has the highest average danceability?
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" width = 200px />

In [170]:
popDivas = pd.read_csv("https://raw.githubusercontent.com/cmparlettpelleriti/CPSC392ParlettPelleriti/master/Data/PopDivas_data.csv")
popDivas.head()

###

Unnamed: 0.1,Unnamed: 0,artist_name,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,duration_ms,track_name
0,1,Beyoncé,0.386,0.288,1,-18.513,1,0.0602,0.533,0.0167,0.141,0.399,43850,balance (mufasa interlude)
1,2,Beyoncé,0.484,0.363,5,-8.094,0,0.0368,0.645,0.0,0.125,0.201,226479,BIGGER
2,3,Beyoncé,0.537,0.247,2,-17.75,1,0.0793,0.199,1e-05,0.423,0.17,46566,the stars (mufasa interlude)
3,4,Beyoncé,0.672,0.696,4,-6.693,0,0.177,0.2,0.0275,0.0736,0.642,162353,FIND YOUR WAY BACK
4,5,Beyoncé,0.0,0.00515,9,-22.612,0,0.0,0.524,0.95,0.114,0.0,13853,uncle scar (scar interlude)


Grab only the songs that are by Beyonce or Britney Spears, and have an energy score above 0.5.
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" width = 200px />

In [None]:
###

# Math with Arrays

numpy arrays allow us to do *vectorized* operations. *Vectorized* operations are applied elementwise to each item in an array, rather than to the array as a whole. For example, if we want to get the square of every number in an array, we can say `array**2`. You can see below, that calling `**2` on the array `x` squares each item in `x`.

In [171]:
x = np.array([1,2,3,4,5,10])

x**2

array([  1,   4,   9,  16,  25, 100])

Similarly we can subtract one array from another `a - b`, which will substract the first element of `b` from the first element of `a`, etc.

In [172]:
a = np.array([1,2,3,4,5])
b = np.array([1,4,-2,5,9])

a-b

array([ 0, -2,  5, -1, -4])

## Your Turn

Using the array knowledge you just reviewed, multiply the arrays `a` and `b` together and then find the sum of those producs.
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" width = 200px />

In [173]:
###


# `np.random`

`np.random` is a great package that allows us to generate random values from different distributions, or randomly choose items from a collection. The two most common functions we use are `np.random.choice()` and `np.random.normal()`.

`np.random.choice()` takes 3 main arguments:

- `a`: an array or collection of items to choose from.

- `size`: an integer that represents how many items you want to choose/sample from `a`

- `replace`: a boolean that tells you whether or not to allow the function to select an item more than once in the sample.


`np.random.normal()` takes 3 main arguments as well:

- `loc`: the mean of the normal distribution to sample from

- `scale`: the standard deviation of the normal distribution to sample from

- `size`: the number of samples to draw.

In [174]:
# draw 100 samples from a standard normal distribution with mean = 0, sd = 1

samp100 = np.random.normal(0,1,100)

samp100

array([ 0.06037281,  2.31980558, -0.05250125,  1.25137367, -1.43488252,
        0.01167165,  1.18775872, -1.62549287, -1.96151595,  1.05325513,
       -0.19399823, -1.00558654, -0.44203004,  1.44797613, -0.03492908,
        0.25743968, -0.23727859,  1.15154034,  1.55350651,  0.59278884,
       -0.65779717, -0.69775142, -0.10953791, -0.86086722, -0.41696215,
        0.72465504, -1.35365452,  0.31012559, -0.77205588, -0.97903983,
        0.1534542 ,  2.47970649,  1.76662198,  0.00626958,  1.52235068,
        0.64952245, -0.5347307 , -0.88305475,  0.07496434,  0.31652437,
        0.36300013,  1.36468855, -0.56676479, -0.435667  , -0.68701119,
       -0.39508733, -0.34959988,  0.69949934,  0.02177523,  0.90734589,
       -0.60316481, -1.34394014, -1.0075919 ,  2.04212632,  1.36476542,
        1.73822604,  0.69514256,  1.04484113, -0.43117931,  0.07836263,
       -1.18226274,  0.1960779 , -0.95705686,  1.29531505, -0.77559543,
       -0.17663151,  0.54851995,  0.90502324,  0.81818959, -0.47

In [175]:
# draw 657 samples from `my_list` with replacement

my_list = range(0,250)

samp657 = np.random.choice(my_list, 657, replace = True)
samp657

array([ 69,  44,  61,  30, 238,  22,  97, 171,  13, 148, 224, 143, 193,
         7,  89, 236,   2, 237,  71,  91,  24, 111,  97, 152,  21, 243,
       114, 174, 142, 203,  24, 249, 171, 244, 237, 200, 228,  80, 239,
       161, 125, 128, 109, 236, 131,  11, 112,  34,  20, 192,  30,  36,
        70,  61, 110, 249, 132, 183,  45,  70,  18, 223, 134,  98,  13,
        53, 174, 193,  74,  14, 173, 120,  23,  23, 244, 248, 222, 160,
       160, 117, 155, 148,  22, 203,  28,  45, 183,   5, 108, 196, 190,
        29,   2, 215,  67, 119, 141,  87,  40, 177, 179, 122, 208, 225,
        39, 249,  62, 206, 198, 153,  49,  62, 124, 105, 230,  12,  53,
        85, 192, 111,  74,  84,  84, 181, 125,  76, 121,  83, 178, 227,
        15,  51, 237,  68,  99, 118, 164,  31,  56, 221,  61,  29, 162,
       138, 233, 183, 204, 122,  18,  71,  21, 122,  73,  33, 147,  17,
       106,  35,  13, 181,  37, 137,  79,  45,  80, 136, 164, 110, 206,
        60,  85, 204, 126, 157,  94, 204, 219,  29, 131, 217, 16

In [181]:
for i in range(0,10):
    np.random.seed(123)
    print(list(np.random.choice(range(0,1000), 10, replace = True)))

[510, 365, 382, 322, 988, 98, 742, 17, 595, 106]
[123, 569, 214, 737, 96, 113, 638, 47, 73, 544]
[942, 224, 111, 409, 339, 846, 253, 420, 608, 208]
[68, 817, 823, 451, 2, 340, 39, 322, 596, 559]
[504, 957, 176, 135, 873, 99, 380, 860, 180, 358]
[865, 213, 630, 862, 411, 290, 993, 588, 680, 899]
[106, 837, 576, 843, 418, 826, 394, 790, 717, 146]
[484, 271, 411, 765, 158, 180, 582, 365, 371, 154]
[720, 390, 782, 255, 244, 359, 971, 950, 967, 129]
[555, 186, 357, 695, 537, 434, 980, 824, 305, 780]


## Your Turn

Choose 100 samples from `range(100, 1000)` without replacement.
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" width = 200px />

In [176]:
###

Make a data frame with two columns, `x` and `y`. `x` should be created by randomly sampling 100 samples from a normal distribution with mean = 0, and sd = 1. `y` should be created by randomly sampling 100 samples from a normal distribution with mean = 12, sd = 20.
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" width = 200px />

In [177]:
###