### Data Preprocessing Module

This module includes the class `DataPrep` and the functions that go inside it.
Each function will help with specific tasks in data preprocessing and getting the data ready for feature engineering 
and training the model. 

We are going to use resources that will help us with data manipulation. This includes libraries such as NumPy and Pandas.


There will be specific description for each function and their usage.

List of functions in this file:


In [75]:
import numpy as np
import pandas as pd

### Extract Date
This function extracts the year, month, day, and quarter from the `Date` column in the DataFrame that is passed to it as a parameter. 

The properties related to date are added in order for better readability of the DataFrame and columns.

In [112]:
class DataPrep:
    
    def __init__(self, df):
        self.df = df


    def extract_date(self):
        """
        Extract the date from the DataFrame and merge the columns back into the DataFrame

        Parameters: 
            self.df (DataFrame): DataFrame containing the stock price data.

        Parameter constraints:
            Should have a column 'date'
            The format of the date should be valid

        Returns:
            DataFrame: A new DataFrame with 'year', 'month', 'day, and 'quarter' added as new columns
        """
        # Setting the Date column to Panda's datetime format
        self.df['Date'] = pd.to_datetime(self.df['Date'])

        #Extracting the year and adding it after 'Date' column
        self.df['Year'] = self.df['Date'].dt.year
        index_date = self.df.columns.get_loc('Date')
        self.df.insert(index_date + 1, 'Year', self.df.pop('Year'))

        # Extracting the month and adding it after 'Year' column
        self.df['Month'] = self.df['Date'].dt.month
        index_year = self.df.columns.get_loc('Year')
        self.df.insert(index_year + 1, 'Month', self.df.pop('Month'))

        # Extracting the day and adding it after 'Month' column
        self.df['Day of Week'] = self.df['Date'].dt.dayofweek
        index_month = self.df.columns.get_loc('Month')
        self.df.insert(index_month + 1, 'Day of Week', self.df.pop('Day of Week'))

        # Extracting the quarter and adding it after 'Day' column
        self.df['Quarter'] = self.df['Date'].dt.quarter
        index_day = self.df.columns.get_loc('Day of Week')
        self.df.insert(index_day + 1, 'Quarter', self.df.pop('Quarter'))

        return self.df
    
    #The definition of the function extract_changes
    def extact_changes(self):
        """ 
        Extracts the precentage, daily, and difference changes of high, low, volume, and close in the DataFrame. 

        Parameters: 
            self.df (DataFrame): DataFrame containing the stock price data.
        
        Parameter Constraints:
            Should have the columns high, low, volume, and close
        
        Returns:
            A new DataFrame with the columns 'Daily Range', 'Volume Change', and 'Price Increase' added to previous DataFrame  
        """

        self.df['Daily Range'] = self.df['High'] - self.df['Low']
        index_low = self.df.columns.get_loc('Low')
        self.df.insert(index_low + 1, 'Daily Range', self.df.pop('Daily Range'))
        
        self.df['Volume Change'] = self.df['Volume'].pct_change()
        index_volume = self.df.columns.get_loc('Volume')
        self.df.insert(index_volume + 1, 'Volume Change', self.df.pop('Volume Change'))

        self.df['Price Increase'] = (self.df['Close'] > self.df['Close'].shift(1)).astype(int)
        index_close = self.df.columns.get_loc('Close')
        self.df.insert(index_close + 1, 'Price Increase', self.df.pop('Price Increase'))

        return self.df



### _The code block below is for testing the class and its functions_


In [111]:
df = pd.DataFrame({'Date' : ['2023-09-15', '2023-09-16', '2023-09-17'],
                   'High' : [21,22,23],
                   'Low' : [15,16,17],
                   'Volume' : [1200,2300,2400],
                   'Close' : [22,26,29]
                   })

df0 = DataPrep(df).extract_date()
df0 = DataPrep(df).extact_changes()
df0

KeyError: 'Day'