# Module 5 Assignment 
A few things you should keep in mind when working on assignments:

1. Make sure you fill in any place that says `YOUR CODE HERE`. Do **not** write your answer anywhere else other than where it says `YOUR CODE HERE`. Anything you write elsewhere will be removed or overwritten by the autograder.

2. Before you submit your assignment, make sure everything runs as expected. Go to the menubar, select _Kernel_, and restart the kernel and run all cells (_Restart & Run all_).

3. Do not change the title (i.e. file name) of this notebook.

4. Make sure that you save your work (in the menubar, select _File_ → _Save and CheckPoint_).


In [1]:
import pandas as pd
import numpy as np

from nose.tools import assert_equal, assert_true, assert_false
from numpy.testing import assert_array_almost_equal, assert_array_equal


# Problem 1: Read in a dataset

For this problem you will read in a dataset using pandas. In the cell below, the function *read_data* has parameter "file_path" which contains a path to a dataset.
- Use the *read_csv* function from pandas to read in the dataset from the file path and return the resulting dataframe.

In [2]:
def read_data(file_path):
    '''
    Reads in a dataset using pandas.
    
    Parameters
    ----------
    file_path : string containing path to a file
    
    Returns
    -------
    pandas dataframe with data read in from the file path
    '''
    ### BEGIN SOLUTION
    df = pd.read_csv(file_path)
    return df
    ### END SOLUTION


In [3]:
df = read_data('data/dow_jones_index.data')
assert_equal(len(df), 750, msg="The dataset should have 750 rows. Your solution only has %s"%len(df))
assert_equal(set(df.columns.tolist()), set(['quarter', 'stock', 'date', 'open', 'high', 'low', 'close',
                                            'volume', 'percent_change_price', 'percent_change_volume_over_last_wk',
                                            'previous_weeks_volume', 'next_weeks_open', 'next_weeks_close',
                                            'percent_change_next_weeks_price', 'days_to_next_dividend',
                                            'percent_return_next_dividend']), 
             msg="Your column names do not match the solutions")

ans = df.head().values.tolist()
sol = [[1, 'AA', '1/7/2011', 15.82, 16.72, 15.78, 16.42, 239655616, 3.7926699999999998, np.nan, np.nan,
        16.71, 15.97, -4.42849, 26, 0.182704],
       [1, 'AA', '1/14/2011', 16.71, 16.71, 15.64, 15.97, 242963398, -4.42849, 1.380223028, 239655616.0,
        16.19, 15.79, -2.4706599999999996, 19, 0.187852],
       [1, 'AA', '1/21/2011', 16.19, 16.38, 15.6, 15.79, 138428495, -2.4706599999999996, -43.02495926,
        242963398.0, 15.87, 16.13, 1.63831, 12, 0.189994],
       [1, 'AA', '1/28/2011', 15.87, 16.63, 15.82, 16.13, 151379173, 1.63831, 9.355500109, 138428495.0,
        16.18, 17.14, 5.93325, 5, 0.18598900000000002],
       [1, 'AA', '2/4/2011', 16.18, 17.39, 16.18, 17.14, 154387761, 5.93325, 1.987451735, 151379173.0,
        17.33, 17.37, 0.230814, 97, 0.175029]]

assert_array_equal(ans, sol, err_msg="Your answer does not match the solution.")

print("2 random rows of the dow jones dataset:")
df.sample(2)

2 random rows of the dow jones dataset:


Unnamed: 0,quarter,stock,date,open,high,low,close,volume,percent_change_price,percent_change_volume_over_last_wk,previous_weeks_volume,next_weeks_open,next_weeks_close,percent_change_next_weeks_price,days_to_next_dividend,percent_return_next_dividend
670,2,PG,6/17/2011,64.65,65.22,63.33,64.69,46861128,0.061872,-3.576674,48599369.0,64.51,62.59,-2.97628,33,0.819292
454,2,DD,4/21/2011,54.28,56.0,53.19,55.91,15537427,3.00295,-35.421288,24059673.0,55.69,56.79,1.97522,20,0.733321


# Problem 2: Selecting first n rows of a dataframe

In the code cell below the function *get_head_rows* accepts 2 parameters *df* which is a dataframe and *n* which is an integer.

For this problem:
- Return first *n* rows of *df*

In [4]:
def get_head_rows(df, n):
    '''    
    Parameters
    ----------
    df: pandas DataFrame
    n: integer
    Returns
    -------
    returns first n rows of df
    '''
    ### BEGIN SOLUTION
    return df.head(n)
    ### END SOLUTION


In [5]:
assert_equal(get_head_rows(df, 3).shape[0], 3, msg="You did not return first 3 rows")
assert_equal(get_head_rows(df, 4).shape[0], 4, msg="You did not return first 4 rows")


In [6]:
df[df.close<11]

Unnamed: 0,quarter,stock,date,open,high,low,close,volume,percent_change_price,percent_change_volume_over_last_wk,previous_weeks_volume,next_weeks_open,next_weeks_close,percent_change_next_weeks_price,days_to_next_dividend,percent_return_next_dividend
409,2,BAC,6/10/2011,11.18,11.2,10.41,10.8,873241317,-3.39893,46.413698,596420503.0,10.89,10.68,-1.92837,82,0.092593
410,2,BAC,6/17/2011,10.89,11.12,10.4,10.68,889460755,-1.92837,1.857383,873241317.0,10.59,10.52,-0.661001,75,0.093633
411,2,BAC,6/24/2011,10.59,10.94,10.48,10.52,603098073,-0.661001,-32.195089,889460755.0,10.52,11.09,5.41825,68,0.095057


# Problem 3: Selecting stocks under certain price

In the code cell below the function *get_stocks_by_price* accepts 2 parameters *df* which is a dataframe and *price_cut* which is a float.

For this problem:
- Return all data with close price less than *price_cut*.

In [7]:
def get_stocks_by_price(df, price_cut):
    '''
    
    Parameters
    ----------
    df: pandas dataframe containing data from Problem 1's solution
    price_cut: float.
    Returns
    -------
    
    Return all data with close price less than price_cut.
    '''
    ### BEGIN SOLUTION

    return df[df.close<price_cut]
    ### END SOLUTION


In [8]:
assert_equal(get_stocks_by_price(df, 12).shape[0], 7, msg="You did not return a correct pandas dataframe object")
assert_true(get_stocks_by_price(df, 12).close.max()<12, msg="You did not return a correct pandas dataframe object")


# Problem 4: Get highest close price for a stock

In the code cell below the function *get_high_low_close* accepts 2 parameters *df* which is a dataframe and *symbol* which is a string.

For this problem:
- Return highest and lowest close price of stock represent by *symbol*

In [9]:
def get_high_low_close(df, symbol):
    '''
    Get highest and lowest close price of a stock.
    
    Parameters
    ----------
    df: pandas dataframe containing data from Problem 1's solution
    symbol: stock symbol
    
    Returns
    -------
    returns two values, highest and lowest close price of the stock.
    '''
    ### BEGIN SOLUTION
    return df[df.stock==symbol].close.max(), df[df.stock==symbol].close.min()
    ### END SOLUTION


In [10]:
assert_equal(get_high_low_close(df, 'AA'), (17.92, 14.72), msg="You did not return correct highest and lowest close of AA")
assert_equal(get_high_low_close(df, 'IBM'), (170.58, 147.93), msg="You did not return correct highest and lowest close of IBM")


-----

**&copy; 2019: Gies College of Business at the University of Illinois.**

This notebook is released under the [Creative Commons license CC BY-NC-SA 4.0][ll]. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.

[ll]: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode