# Module 5 Assignment
A few things you should keep in mind when working on assignments:

1. Make sure you fill in any place that says `YOUR CODE HERE`. Do **not** write your answer anywhere else other than where it says `YOUR CODE HERE`. Anything you write elsewhere will be removed or overwritten by the autograder.

2. Before you submit your assignment, make sure everything runs as expected. Go to the menubar, select _Kernel_, and restart the kernel and run all cells (_Restart & Run all_).

3. Do not change the title (i.e. file name) of this notebook.

4. Make sure that you save your work (in the menubar, select _File_ → _Save and CheckPoint_).


In [1]:
import pandas as pd
import numpy as np

from nose.tools import assert_equal, assert_true, assert_false, assert_almost_equal
from numpy.testing import assert_array_almost_equal, assert_array_equal


# Problem 1: Read in a subset of a dataset

For this problem you will read in a dataset using pandas. In the cell below, the function *read_data* has parameter "file_path" which contains a path to a dataset.
- Use the *read_csv* function from pandas to read in the dataset from the file path, then return a subset of the dataframe, keep following columns:
quarter, stock, date, open, high, low, close, volume

In [2]:
def read_data(file_path):
    '''
    Reads in a dataset using pandas.
    
    Parameters
    ----------
    file_path : string containing path to a file
    
    Returns
    -------
    A subset of pandas dataframe with data read in from the file path, only keep:
    quarter, stock, date, open, high, low, close, volume
    '''
    ### BEGIN SOLUTION
    df = pd.read_csv(file_path)
    return df[['quarter', 'stock', 'date', 'open', 'high', 'low', 'close', 'volume']]
    ### END SOLUTION
    

In [3]:
df = read_data('data/dow_jones_index.data')
assert_equal(len(df), 750, msg="The dataset should have 750 rows. Your solution only has %s"%len(df))
assert_equal(set(df.columns.tolist()), set(['quarter', 'stock', 'date', 'open', 'high', 'low', 'close', 'volume']), 
             msg="Your column names do not match the solutions")

ans = df.head().values.tolist()
sol = [[1, 'AA', '1/7/2011', 15.82, 16.72, 15.78, 16.42, 239655616],
       [1, 'AA', '1/14/2011', 16.71, 16.71, 15.64, 15.97, 242963398],
       [1, 'AA', '1/21/2011', 16.19, 16.38, 15.6, 15.79, 138428495],
       [1, 'AA', '1/28/2011', 15.87, 16.63, 15.82, 16.13, 151379173],
       [1, 'AA', '2/4/2011', 16.18, 17.39, 16.18, 17.14, 154387761]]

assert_array_equal(ans, sol, err_msg="Your answer does not match the solution.")

print("2 random rows of the dow jones dataset:")
df.sample(2)

2 random rows of the dow jones dataset:


Unnamed: 0,quarter,stock,date,open,high,low,close,volume
212,1,KO,3/4/2011,64.17,65.87,63.86,65.21,61665987
601,2,MCD,5/20/2011,80.44,82.85,80.44,82.33,27816875


# Problem 2: Selecting data for a particular stock

In the code cell below the function *select_stock* accepts 2 parameters *df* which is a dataframe(created by problem1) and *symbol* which is a string containing a stock symbol inside of *df*.

For this problem:
- Select data from the dataframe passed into the function ("df") only if stock symbol which is stored in **stock** column is equal to the value of the function parameter "symbol".


- Return the resulting dataframe containing data only for a particular stock symbol.

In [4]:
def select_stock(df, symbol):
    '''
    Selects data only containing a particular stock symbol.
    
    Parameters
    ----------
    df: dataframe containing data from the dow jones index
    symbol: string containing the stock symbol to select
    
    Returns
    -------
    dataframe containing a particular stock
    '''
    ### BEGIN SOLUTION

    return df[df.stock==symbol]
    ### END SOLUTION


In [5]:
df_AA = select_stock(df.copy(), 'AA')
assert_equal(len(df_AA), 25, msg='There are only 25 observations of the stock symbol AA')

ans = df_AA[['open', 'close', 'high', 'low']].head().values.tolist()
sol = [[15.82, 16.42, 16.72, 15.78],
       [16.71, 15.97, 16.71, 15.64],
       [16.19, 15.79, 16.38, 15.6],
       [15.87, 16.13, 16.63, 15.82],
       [16.18, 17.14, 17.39, 16.18]]
assert_array_equal(ans, sol, err_msg="Your first 5 rows do not match the solutions.")

df_IBM = select_stock(df.copy(), 'IBM')
assert_equal(len(df_IBM), 25, msg='There are only 25 observations of the stock symbol IBM')

ans = df_IBM[['open', 'close', 'high', 'low']].head().values.tolist()
sol = [[147.21, 147.93, 148.86, 146.64],
       [147.0, 150.0, 150.0, 146.0],
       [149.82, 155.5, 156.78, 149.38],
       [155.42, 159.21, 164.35, 155.33],
       [159.18, 164.0, 164.2, 158.68]]
assert_array_equal(ans, sol, err_msg="Your first 5 rows do not match the solutions.")

# Problem 3: Selecting data for a particular stock in a particular quarter

In the code cell below the function *select_stock_quarter* accepts 3 parameters *df* which is a dataframe(created by problem1), *symbol* which is a string containing a stock symbol inside of *df* and *quarter*.

For this problem:
- Select data from the dataframe passed into the function ("df") only if stock symbol is equal to the value of the function parameter "symbol" and quarter is equals to the value of function parameter "quarter"


- Return the resulting dataframe containing data only for a particular stock symbol in the particular quarter.

In [6]:
def select_stock_quarter(df, symbol, quarter):
    '''
    Selects data only containing a particular stock symbol in the particular quarter.
    
    Parameters
    ----------
    df: dataframe containing data from the dow jones index
    symbol: string containing the stock symbol to select
    
    Returns
    -------
    dataframe containing a particular stock
    '''
    ### BEGIN SOLUTION

    return df[(df.stock==symbol)&(df.quarter==quarter)]
    ### END SOLUTION


In [7]:
ans = df_AA[['open', 'close', 'high', 'low']].head().values.tolist()
ans

[[15.82, 16.42, 16.72, 15.78],
 [16.71, 15.97, 16.71, 15.64],
 [16.19, 15.79, 16.38, 15.6],
 [15.87, 16.13, 16.63, 15.82],
 [16.18, 17.14, 17.39, 16.18]]

In [8]:
df_AA_2 = select_stock_quarter(df.copy(), 'AA', 2)
assert_equal(len(df_AA_2), 13, msg='There are only 13 observations of the stock symbol AA in 2nd quarter')

ans = df_AA[['open', 'close', 'high', 'low']].head().values.tolist()
sol = [[15.82, 16.42, 16.72, 15.78],
      [16.71, 15.97, 16.71, 15.64],
      [16.19, 15.79, 16.38, 15.6],
      [15.87, 16.13, 16.63, 15.82],
      [16.18, 17.14, 17.39, 16.18]]
assert_array_equal(ans, sol, err_msg="Your first 5 rows do not match the solutions.")

df_IBM_1 = select_stock_quarter(df.copy(), 'IBM', 1)
assert_equal(len(df_IBM_1), 12, msg='There are only 12 observations of the stock symbol IBM in 1st quarter')

# Problem 4: Computing average volume of stocks

In the code cell below the function *comp_average_volume* accepts 1 parameter "df" which is a dataframe(created by problem1).

For this problem:
- group df by stock column and calculate average volume of each stock
- the result dataframe should have *stock* as a regular column instead of index


In [9]:
def comp_average_volume(df):
    '''
    Computes average volume of each stock
    
    Parameters
    ----------
    df: Pandas dataframe(created by problem 1)
    
    Returns
    -------
    a pandas dataframe with stock column as regular column.
    '''
    ### BEGIN SOLUTION

    return df.groupby(by='stock', as_index=False).agg({'volume':'mean'})
    ### END SOLUTION


In [10]:
mean_vol = comp_average_volume(df)

assert_equal(mean_vol.shape, (30, 2))

ans = mean_vol.head().volume.values.tolist()
sol = [129638810.2, 35208481.6, 23781420.52, 722999135.56, 33731115.88]

assert_array_almost_equal(ans, sol, err_msg="Your first 5 rows do not match the solutions.")


# Problem 5: Comput sample statistics for stocks close price

The code cell below contains a function called *comp_close_stat* that accepts 3 parameters "df" which contains data from the dow jones, "symbol" for a particular stock symbol, and stat which will contain 1 of the 3 strings: "mean", "max", or "min".

For this problem:
- if the stat is equal to "mean" return the mean of the close price of stock "symbol"
- if the stat is equal to "min" return the minimum close price of stock "symbol"
- if the stat is equal to "max" return the maximum close price of stock "symbol"


In [11]:
def comp_close_stat(df, symbol, stat='mean'):
    '''
    Computes a particular statistic for a particular stock close price.
    
    Parameters
    ----------
    df: Pandas dataframe
    symbol: stock symbol
    stat: mean, min or max
    
    Returns
    -------
    Mean, max or min close price for stock "sybmol"
    '''
    ### BEGIN SOLUTION

    if stat is 'mean':
        return df[df.stock==symbol].close.mean()
    if stat is 'min':
        return df[df.stock==symbol].close.min()
    if stat is 'max':
        return df[df.stock==symbol].close.max()
    ### END SOLUTION


In [12]:
assert_equal(comp_close_stat(df, 'IBM', 'min'), 147.93)
assert_equal(comp_close_stat(df, 'IBM', 'max'), 170.58)
assert_almost_equal(comp_close_stat(df, 'AA', 'mean'), 16.5044)

-----

**&copy; 2019: Gies College of Business at the University of Illinois.**

This notebook is released under the [Creative Commons license CC BY-NC-SA 4.0][ll]. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.

[ll]: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode