# Module 8 Assignment

A few things you should keep in mind when working on assignments:

1. Run the first code cell to import modules needed by this assignment before proceeding to problems.
2. Make sure you fill in any place that says `# YOUR CODE HERE`. Do not write your answer anywhere else other than where it says `# YOUR CODE HERE`. Anything you write elsewhere will be removed or overwritten by the autograder.
3. Each problem has an autograder cell below the answer cell. Run the autograder cell to check your answer. If there's anything wrong in your answer, the autograder cell will display error messages.
4. Before you submit your assignment, make sure everything runs as expected. Go to the menubar, select Kernel, and Restart & Run all. If the notebook runs through the last code cell without an error message, you've answered all problems correctly.
5. Make sure that you save your work (in the menubar, select File â†’ Save and CheckPoint).

-----

# Run Me First!

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from datetime import datetime, date, timedelta

from nose.tools import assert_equal, assert_true

-----

## Problem 1: Get Future Date

In the code cell below, we declare a function named `get_future_date` that takes two argument: `current_date`, which is a datetime object, and `days` which is an integer. The function will return the datetime object `days` after `current_date`.

To complete this problem, finish writing the function `get_future_date`:
- Add `days` into `current_date`(Hint: use `timedelta`).
- return the new datetime object.

-----

In [2]:
def get_future_date(current_date, days):
    ### BEGIN SOLUTION
    return current_date + timedelta(days=days)
    ### END SOLUTION

In [3]:
assert_equal(get_future_date(datetime(2019, 9,9), 1), datetime(2019, 9, 10), msg="Function is not correct." )
assert_equal(get_future_date(datetime(2019, 9,9), 366), datetime(2020, 9, 9), msg="Function is not correct." )

-----

## Problem 2: Reading in Data

In the code cell below, we declare a function named `read_data` that takes one function parameter `file_path`, which is a string.

To complete this problem, finish writing the function `read_data`:
- Read data from the file specified in the string `file_path`, by using the Pandas `read_csv` function.
- Return the resulting DataFrame.

-----

In [4]:
def read_data(file_path):
    '''
    Parameters
    ----------
    file_path: string containing path to the dataset
    
    Returns
    -------
    Pandas DataFrame
    '''
    ###BEGIN SOLUTION###
    return pd.read_csv(file_path)
    ###END SOLUTION###

In [5]:
path = 'data/dow_jones_index.data'
dow_df = read_data(path)
assert_equal(dow_df.shape[1], 16, msg="The number of columns your dataset has, does not match the solutions")
assert_equal(dow_df.shape[0], 750, msg="The number of rows your dataset has, does not match the solutions")
dow_df.head(2)

Unnamed: 0,quarter,stock,date,open,high,low,close,volume,percent_change_price,percent_change_volume_over_last_wk,previous_weeks_volume,next_weeks_open,next_weeks_close,percent_change_next_weeks_price,days_to_next_dividend,percent_return_next_dividend
0,1,AA,1/7/2011,15.82,16.72,15.78,16.42,239655616,3.79267,,,16.71,15.97,-4.42849,26,0.182704
1,1,AA,1/14/2011,16.71,16.71,15.64,15.97,242963398,-4.42849,1.380223,239655616.0,16.19,15.79,-2.47066,19,0.187852


-----

## Problem 3: Set Datetime Index


For this problem you will use `dow_df` created in problem 1 autograder cell.

To solve this problem do the following:
- Convert the date column in `dow_df` from `object` type to `datetime` type by using the Pandas function `to_datetime`.
- Create datetime index from date column using pandas `DatetimeIndex` function and set it as the index of `dow_df`.

After this problem, dow_df has a datetime index.


------

In [6]:
### BEGIN SOLUTION
dow_df.date = pd.to_datetime(dow_df.date)
dow_df.index = pd.DatetimeIndex(dow_df.date)
### END SOLUTION

In [7]:
assert_equal(type(dow_df.index), pd.core.indexes.datetimes.DatetimeIndex, msg="The index is not datetime object.")
dow_df.head(2)

Unnamed: 0_level_0,quarter,stock,date,open,high,low,close,volume,percent_change_price,percent_change_volume_over_last_wk,previous_weeks_volume,next_weeks_open,next_weeks_close,percent_change_next_weeks_price,days_to_next_dividend,percent_return_next_dividend
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2011-01-07,1,AA,2011-01-07,15.82,16.72,15.78,16.42,239655616,3.79267,,,16.71,15.97,-4.42849,26,0.182704
2011-01-14,1,AA,2011-01-14,16.71,16.71,15.64,15.97,242963398,-4.42849,1.380223,239655616.0,16.19,15.79,-2.47066,19,0.187852


---

# Problem 4: Slice DataFrame

Slice June 2011 stock information for 'AA'.

For this problem you will use **dow_df** updated in problem 2.

To solve this problem do the following:
- Use boolean mask to get rows of stock 'AA' from dow_df, assign the resulting DataFrame to aa_df.
- Get all June 2011 data from aa_df and assign the resulting DataFrame to **aa_201106_df**.

After this problem, there is a new variable **aa_201106_df** defined.

-----

In [8]:
### BEGIN SOLUTION
aa_201106_df = dow_df[dow_df.stock=='AA']['2011-06']
### END SOLUTION

In [9]:
assert_equal(aa_201106_df.shape[0], 4, msg="aa_201106_df is not correct")
aa_201106_df

Unnamed: 0_level_0,quarter,stock,date,open,high,low,close,volume,percent_change_price,percent_change_volume_over_last_wk,previous_weeks_volume,next_weeks_open,next_weeks_close,percent_change_next_weeks_price,days_to_next_dividend,percent_return_next_dividend
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2011-06-03,2,AA,2011-06-03,16.73,16.83,15.77,15.92,77152591,-4.8416,-0.108849,77236662.0,15.92,15.28,-4.0201,61,0.188442
2011-06-10,2,AA,2011-06-10,15.92,16.03,15.17,15.28,94970970,-4.0201,23.094985,77152591.0,15.29,14.72,-3.72793,54,0.196335
2011-06-17,2,AA,2011-06-17,15.29,15.5,14.59,14.72,111273573,-3.72793,17.16588,94970970.0,14.67,15.23,3.81731,47,0.203804
2011-06-24,2,AA,2011-06-24,14.67,15.6,14.56,15.23,99423717,3.81731,-10.649299,111273573.0,15.22,16.31,7.16163,40,0.19698


---

# Problem 5: Resample DataFrame

Resample stock 'AA'.

For this problem you will use **dow_df** updated in problem 2.

To solve this problem do the following:
- Use boolean mask to get rows of stock `AA` from dow_df, assign the resulting DataFrame to aa_df.
- Resample aa_df with month end frequcy(Hint: with code 'M').
- Apply mean() function on the Resampler object and assign the resulting DataFrame to **aa_resample**.

After this problem, there is a new variable **aa_resample** defined.


-----

In [10]:
### BEGIN SOLUTION
aa_resample = dow_df[dow_df.stock=='AA'].resample('M').mean()
aa_resample
### END SOLUTION

Unnamed: 0_level_0,quarter,open,high,low,close,volume,percent_change_price,percent_change_volume_over_last_wk,previous_weeks_volume,next_weeks_open,next_weeks_close,percent_change_next_weeks_price,days_to_next_dividend,percent_return_next_dividend
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2011-01-31,1.0,16.1475,16.61,15.71,16.0775,193106700.0,-0.367042,-10.763079,207015800.0,16.2375,16.2575,0.168103,15.5,0.186635
2011-02-28,1.0,16.97,17.425,16.5975,17.1175,120521200.0,0.941184,3.056564,125120500.0,17.1275,16.9775,-0.884186,86.5,0.175302
2011-03-31,1.0,16.43,16.815,15.81,16.4525,112437500.0,0.16305,-6.480814,121795400.0,16.51,16.675,1.001313,58.5,0.182463
2011-04-30,2.0,17.182,17.734,16.672,17.176,124337200.0,0.08215,8.929478,125280900.0,17.21,17.112,-0.453783,27.2,0.174797
2011-05-31,2.0,16.8475,17.3375,16.4175,16.7475,133050500.0,-0.534822,17.156958,136449300.0,16.7125,16.44,-1.57151,57.75,0.179227
2011-06-30,2.0,15.6525,15.99,15.0225,15.2875,95705210.0,-2.19308,7.375679,90158450.0,15.275,15.385,0.807728,50.5,0.19639


In [11]:
assert_equal(type(aa_resample), pd.DataFrame, msg="aa_resample should be a DataFrame")
assert_equal(aa_resample.shape[0], 6, msg="aa_resample is not correct")
assert_equal(aa_resample.index[0].date(), date(2011, 1,31),
             msg="aa_resample is not resampled at month end frequency")
aa_resample

Unnamed: 0_level_0,quarter,open,high,low,close,volume,percent_change_price,percent_change_volume_over_last_wk,previous_weeks_volume,next_weeks_open,next_weeks_close,percent_change_next_weeks_price,days_to_next_dividend,percent_return_next_dividend
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2011-01-31,1.0,16.1475,16.61,15.71,16.0775,193106700.0,-0.367042,-10.763079,207015800.0,16.2375,16.2575,0.168103,15.5,0.186635
2011-02-28,1.0,16.97,17.425,16.5975,17.1175,120521200.0,0.941184,3.056564,125120500.0,17.1275,16.9775,-0.884186,86.5,0.175302
2011-03-31,1.0,16.43,16.815,15.81,16.4525,112437500.0,0.16305,-6.480814,121795400.0,16.51,16.675,1.001313,58.5,0.182463
2011-04-30,2.0,17.182,17.734,16.672,17.176,124337200.0,0.08215,8.929478,125280900.0,17.21,17.112,-0.453783,27.2,0.174797
2011-05-31,2.0,16.8475,17.3375,16.4175,16.7475,133050500.0,-0.534822,17.156958,136449300.0,16.7125,16.44,-1.57151,57.75,0.179227
2011-06-30,2.0,15.6525,15.99,15.0225,15.2875,95705210.0,-2.19308,7.375679,90158450.0,15.275,15.385,0.807728,50.5,0.19639
