# Module 4 Assignment

A few things you should keep in mind when working on assignments:

1. Make sure you fill in any place that says `YOUR CODE HERE`. Do **not** write your answer anywhere else other than where it says `YOUR CODE HERE`. Anything you write elsewhere will be removed or overwritten by the autograder.

2. Before you submit your assignment, make sure everything runs as expected. Go to the menubar, select _Kernel_, and restart the kernel and run all cells (_Restart & Run all_).

3. Do not change the title (i.e. file name) of this notebook.

4. Make sure that you save your work (in the menubar, select _File_ → _Save and CheckPoint_).


In [1]:
from nose.tools import assert_equal, assert_almost_equal, assert_true

import sys
import os

import numpy as np
import pandas as pd
from numpy.testing import assert_array_almost_equal

-----

# Problem 1: Generating Random Data

The code cell below declares a function called `gen_rand` that accepts two parameters: `n` and `random_state`, which are both integers.

For this problem, you need to use a pseudo random number generator with a fixed seed to get reproducible results.

Complete the following tasks:
- use NumPy to create a [pseudo random number generator](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.RandomState.html).

- Use the provided `random_state` parameter to seed this pseudo random number generator.

- Use the function `rand` from this pseudo random number generator to create `n` random values in a NumPy array.

-----

In [2]:
def gen_rand(n, random_state=23):
    '''
    
    Generates n random samples in a NumPy array
    
    Parameters
    ----------
    n: integer value specifying the number of samples to generate
    random_state: integer containing seed to use for pseudo random number generator.
    
    Returns
    --------
    NumPy array with n random samples from uniform distribution between 0 and 1.
    '''
    
    ### BEGIN SOLUTION
    r_gen = np.random.RandomState(random_state)
    return r_gen.rand(n)
    ### END SOLUTION

In [3]:
ans = gen_rand(5)
sol = [0.5172978838465893, 0.9469626038148141, 0.7654597593969069, 0.2823958439671127, 0.22104536326165258]
assert_array_almost_equal(ans, sol)

-----

# Problem 2: Basic Vectorized Operations

The code cell below declares the function `basic_op` that accepts one parameter `a` that is a NumPy array.

For this problem: 
- Complete the function declared below so that it computes and returns the result of this expression:  $a * \pi - e$
    - Hint use NumPy's implementations of [pi](https://www.numpy.org/devdocs/reference/constants.html#numpy.pi) and [e](https://www.numpy.org/devdocs/reference/constants.html#numpy.e).
    
----

In [4]:
def basic_op(a):
    '''
    Perform a basic vectorized operation on the provided array    

    Paramaters:
    -----------
    a : NumPy array
    
    Returns
    -------
    a NumPy array containing the result of
    the following operation: a * π - e
    '''
    
    ### BEGIN SOLUTION
    return ((a * np.pi) - np.e)
    ### END SOLUTION

In [5]:
ans2 = basic_op(gen_rand(5))
sol2 = [-1.0931426 ,  0.25668893, -0.31351907, -1.83110912, -2.02384734]
assert_array_almost_equal(ans2, sol2)

-----

# Problem 3: Select Rows from a DataFrame

The code cell below reads a DataFrame from a file. Please select the first 2 rows in the DataFrame and assign it to variable df_2r.

-----

In [6]:
df = pd.read_csv('data/iris.csv')
df.head()

Unnamed: 0,sepal length (in cm),sepal width (in cm),petal length (in cm),petal width (in cm),class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [7]:
### BEGIN SOLUTION
df_2r = df[0:2]
### END SOLUTION

In [8]:
assert_equal(df_2r.shape, (2, 5), msg="Your answer does not match the solutions")
assert_equal(df_2r.iloc[0,0], 5.1, msg="Your answer does not match the solutions")
assert_equal(df_2r.iloc[1,0], 4.9, msg="Your answer does not match the solutions")


-----

# Problem 4: Select Columns from a DataFrame

The code cell below reads a DataFrame from a file. Please select 2 columns that contain **sepal length** and **petal length** data from the DataFrame and assign it to variable df_2c.

-----

In [9]:
df = pd.read_csv('data/iris.csv')
df.head()

Unnamed: 0,sepal length (in cm),sepal width (in cm),petal length (in cm),petal width (in cm),class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [10]:
### BEGIN SOLUTION
df_2c = df[['sepal length (in cm)', 'petal length (in cm)']]
### END SOLUTION

In [11]:
assert_equal(df_2c.shape, (150, 2), msg="Your answer does not match the solutions")
assert_equal(df_2c.iloc[0,0], 5.1, msg="Your answer does not match the solutions")
assert_equal(df_2c.iloc[0,1], 1.4, msg="Your answer does not match the solutions")


-----

# Problem 5: Deal with missing value in a DataFrame

The code cell below creates a DataFrame df_m with missing values. You need to:
- Drop rows with missing value and assign the result to df_drop;
- Fill missing values with column mean and assign the result to df_fill;

-----

In [12]:
df_m = pd.DataFrame({'Height(in inch)':[68, 72, 70, None], 'Weight(in pound)':[150, 200, None, 220]})
df_m

Unnamed: 0,Height(in inch),Weight(in pound)
0,68.0,150.0
1,72.0,200.0
2,70.0,
3,,220.0


In [13]:
### BEGIN SOLUTION
df_drop = df_m.dropna()
df_fill = df_m.fillna(df_m.mean())
### END SOLUTION

In [14]:
assert_equal(df_drop.shape, (2, 2), msg="Your answer does not match the solutions")
assert_equal(df_fill.shape, (4, 2), msg="Your answer does not match the solutions")
assert_equal(df_fill.iloc[2,1], 190, msg="Your answer does not match the solutions")
assert_equal(df_fill.iloc[3,0], 70, msg="Your answer does not match the solutions")


-----

**&copy; 2019: Gies College of Business at the University of Illinois.**

This notebook is released under the [Creative Commons license CC BY-NC-SA 4.0][ll]. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.

[ll]: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode