# Intro

Before training a model, the raw data for this project needs to be processed. This step assumes that the `/Data` folder is already populated with the data. The data can be found here and is released under CC0: https://www.kaggle.com/datasets/borismarjanovic/price-volume-data-for-all-us-stocks-etfs

After downloading the data, you should make sure that the /Data folder contains the following two subfolders: `Data/ETFs` and `Data/Stocks`

Remember that the goal of this project is to classify future stock returns based on their past return charts. Therefore the data needs to be transformed so that it can fulfill this goal.

# Transform the data

The raw data is not suitable for the purposes of this project. Instead the data needs to be transformed into summary statistics (historical returns, standard deviations ...) and stock chart images. The data will be transformed to the following format:

- A dataframe where each row corresponds to a single observation. This dataframe contains the following columns:

    - `date`: the start date of the observation
    - `asset_file`: the path of the file from which this observation was created
    - `stock`: a boolean indicator that is 1 if this observation comes from a stock or 0 if it comes from an ETF
    - `1_month_return`: the return of the asset in the previous calendar month (lagged by one day to omit look-ahead bias)
    - `6_month_return`: the return of the asset in the previous 6 calendar months (lagged by one day to omit look-ahead bias)
    - `12_month_return`: the return of the asset in the previous 12 calendar months (lagged by one day to omit look-ahead bias)
    - `1_month_volatility`: the (daily) volatility of the asset in the previous calendar month (lagged by one day to omit look-ahead bias)
    - `6_month_volatility`: the (daily) volatility of the asset in the previous 6 calendar months (lagged by one day to omit look-ahead bias)
    - `12_month_volatility`: the (daily) volatility of the asset in the previous 12 calendar months (lagged by one day to omit look-ahead bias)
    - `1_month_img`: the name of the .png file containing a GADF chart of the asset price in the previous calendar month (lagged by one day to omit look-ahead bias)
    - `6_month_img`: the name of the .png file containing a GADF chart of the asset price in the previous 6 calendar months (lagged by one day to omit look-ahead bias)
    - `12_month_img`: the name of the .png file containing a GADF chart of the asset price in the previous 12 calendar months (lagged by one day to omit look-ahead bias)
    - `1_month_img_bar`: the name of the .png file containing an area chart of the asset price in the previous calendar month (lagged by one day to omit look-ahead bias)
    - `6_month_img_bar`: the name of the .png file containing an area chart of the asset price in the previous 6 calendar months (lagged by one day to omit look-ahead bias)
    - `12_month_img_bar`: the name of the .png file containing an area chart of the asset price in the previous 12 calendar months (lagged by one day to omit look-ahead bias)
    - `label_1m`: the classification target. This can be 0 (next-month returns < 2%), 1 (next-month returns >= -2% and <= 2%), or 2 (next-month returns > 2%)
    - `1m_return`: the return over the next 1 month (used for baseline regression model)

- Next to this dataframe, a folder `/Charts` needs to be created that contains all the images referenced in this dataframe.

# Supporting functions

A number of functions will be set up that will calculate all the features for a single stock at a single point in time. This function can then be applied many times to construct the test, train, & validation data sets.

In [1]:
import os.path

import pandas as pd
from pandas.errors import EmptyDataError
from os import listdir
from os.path import isfile, join
import random as rand
from datetime import timedelta

import pprint as pp # used during code writing, not necessary

import matplotlib.pyplot as plt # used to create the images

import gc # garbage collector to save on memory

from pyts.image import GramianAngularField

import numpy as np

In [2]:
def read_asset_prices(file_path):
    """
    Read the asset prices from the given path
    :param file_path: a string containing the path to a csv file with the asset prices
    :return: a pandas dataframe with the asset prices for the given asset
    """
    return pd.read_csv(file_path, delimiter=",")

In [3]:
def get_all_assets():
    """
    Get a list of all the asset price file paths
    :return: a list of all the asset price file paths
    """
    stocks = [join("Data/Stocks", f) for f in listdir("Data/Stocks") if isfile(join("Data/Stocks", f))]
    etfs = [join("Data/ETFs", f) for f in listdir("Data/ETFs") if isfile(join("Data/ETFs", f))]

    return stocks + etfs

In [4]:
def pick_random_asset(asset_list):
    """
    Pick a random asset from the given list
    :param asset_list: a list of asset price file paths
    :return: a random entry from the given list
    """

    return rand.choice(asset_list)

In [5]:
def calculate_return(asset_prices, start_date, ndays):
    """
    Calculate the 1-day lagged return from ndays ago up until the start date for
    the given asset prices. Returns are calculated on closing prices.
    :param asset_prices: the asset prices to calculate returns from
    :param start_date: the date at which the return is calculated
    :param ndays: the number of days to look back
    :return: the return over the given period for the given asset
    """
    ndays_price = asset_prices.Close[
        asset_prices.Date == asset_prices.Date[asset_prices.Date <= start_date - timedelta(days=ndays)].max()
    ].iloc[0]

    lagged_price = asset_prices.Close[
        asset_prices.Date == asset_prices.Date[asset_prices.Date <= start_date - timedelta(days=1)].max()
    ].iloc[0]

    return lagged_price / ndays_price - 1

In [6]:
def calculate_std(asset_prices, start_date, ndays):
    """
    Calculate the 1-day lagged volatility from ndays ago up until the start date for
    the given asset prices. Volatility is calculated on daily returns.
    :param asset_prices: the asset prices to calculate returns from
    :param start_date: the date at which the return is calculated
    :param ndays: the number of days to look back
    :return: the volatility over the given period for the given asset
    """
    temp = asset_prices[
        (asset_prices.Date <= start_date - timedelta(days=1)) &
        (asset_prices.Date >= start_date - timedelta(days=ndays))
    ].copy()

    temp = temp.sort_values(by="Date", ascending=True)
    temp["daily_ret"] = temp.Close / temp.Close.shift(1)
    temp.dropna(inplace=True)

    return temp.daily_ret.std()

In [7]:
def calculate_next_month_return(asset_prices, start_date):
    """
    Calculate the return for the next month.

    :param asset_prices: the asset prices to calculate returns from
    :param start_date: the date at which the return is calculated

    :return: the return over the next month for the given asset
    """
    next_month_price = asset_prices.Close[
        asset_prices.Date == asset_prices.Date[asset_prices.Date <= start_date + timedelta(days=31)].max()
    ].iloc[0]

    current_price = asset_prices.Close[
        asset_prices.Date == start_date
    ].iloc[0]

    return next_month_price / current_price - 1

In [8]:
def calculate_next_quarter_return(asset_prices, start_date):
    """
    Calculate the return for the next quarter.

    :param asset_prices: the asset prices to calculate returns from
    :param start_date: the date at which the return is calculated

    :return: the return over the next quarter for the given asset
    """
    next_month_price = asset_prices.Close[
        asset_prices.Date == asset_prices.Date[asset_prices.Date <= start_date + timedelta(days=90)].max()
    ].iloc[0]

    current_price = asset_prices.Close[
        asset_prices.Date == start_date
    ].iloc[0]

    return next_month_price / current_price - 1

In [9]:
def calculate_label(ret):
    """
    Calculate the classification label
    :param ret: the return to base the calculation on
    :return: 0 if ret < -0.01, 1 if 0.01 >= ret >= -0.01, 2 if ret > 0.01
    """
    if ret < -0.01:
        return 0
    elif ret <= 0.01:
        return 1
    else:
        return 2

In [10]:
def plot_prices(asset_prices, start_date, ndays, img_path):
    """
    Plot the closing prices from ndays ago up until the start date for
    the given asset prices on a Gramian Angular Field (2D representation
    of time series).

    :param asset_prices: the asset prices to plot
    :param start_date: the date at which the plot ends
    :param ndays: the number of days to look back
    :param img_path: the path to save the image to
    :return: the location of the created image
    """
    temp = asset_prices[
        (asset_prices.Date <= start_date - timedelta(days=1)) &
        (asset_prices.Date >= start_date - timedelta(days=ndays + 6))
        ].copy()

    temp = temp.sort_values(by="Date", ascending=True)
    # Smoothen the prices by taking the average of the past 6 days (including the day itself)
    # This way, daily random walks of the market are smoothened out more
    temp.Close = (temp.Close + temp.Close.shift(1) + temp.Close.shift(2) + temp.Close.shift(3) + temp.Close.shift(4) + temp.Close.shift(5)) / 6
    temp.dropna(inplace=True)

    temp = asset_prices[
        (asset_prices.Date <= start_date - timedelta(days=1)) &
        (asset_prices.Date >= start_date - timedelta(days=ndays))
        ].copy()

    temp.reset_index(drop=True, inplace=True)

    # Normalize price data on first day
    temp.Close /= temp.Close.iloc[0]

    data = temp.Close.array.reshape(1, -1) # Reshape to plot on summation field

    # get the Gramian Angular Field (auto-scales input data to -1, 1)
    gaf = GramianAngularField(method="difference") # The difference field shows directionality in the time series, while the summation field does not
    x_gaf = gaf.fit_transform(data)
    fig = plt.figure(figsize=(1, 1), dpi=30)
    plt.imshow(x_gaf[0], cmap='brg', origin='lower', vmin=-1., vmax=1.) # Choose BRG colormap so the three main color channels are represented
    plt.axis("off") # turn the axes off

    plt.savefig(img_path)

    plt.close(fig) # avoid displaying image
    plt.close("all")
    fig.clf() # remove from memory
    plt.clf() # clear memory

    return img_path

In [11]:
def plot_prices_area(asset_prices, start_date, ndays, img_path):
    """
    Plot the closing prices from ndays ago up until the start date for
    the given asset prices on an area plot

    :param asset_prices: the asset prices to plot
    :param start_date: the date at which the plot ends
    :param ndays: the number of days to look back
    :param img_path: the path to save the image to
    :return: the location of the created image
    """
    temp = asset_prices[
        (asset_prices.Date <= start_date - timedelta(days=1)) &
        (asset_prices.Date >= start_date - timedelta(days=ndays + 6))
        ].copy()

    temp = temp.sort_values(by="Date", ascending=True)
    # Smoothen the prices by taking the average of the past 6 days (including the day itself)
    # This way, daily random walks of the market are smoothened out more
    temp.Close = (temp.Close + temp.Close.shift(1) + temp.Close.shift(2) + temp.Close.shift(3) + temp.Close.shift(4) + temp.Close.shift(5)) / 6
    temp.dropna(inplace=True)

    temp = asset_prices[
        (asset_prices.Date <= start_date - timedelta(days=1)) &
        (asset_prices.Date >= start_date - timedelta(days=ndays))
        ].copy()

    temp.reset_index(drop=True, inplace=True)

    # Normalize price data on first day
    temp.Close /= temp.Close.iloc[0]

    fig = plt.figure(figsize=(1, 1), dpi=30)

    plt.fill_between(temp.Date, temp.Close, ec="black", fc="black")

    plt.ylim(0.45, 2.05)

    plt.axis("off") # turn the axes off

    plt.savefig(img_path)

    plt.close(fig) # avoid displaying image
    plt.close("all")
    fig.clf() # remove from memory
    plt.clf() # clear memory

    return img_path

In [12]:
def construct_rand_observation(asset_list, img_prefix=""):
    """
    Create a single random observation
    :param asset_list: list of potential asset files to choose from
    :param img_prefix: a prefix for the name of the created images
    :return: a random observation built from the asset list
    """

    # Pick a random asset
    asset_file = pick_random_asset(asset_list)

    try:
        asset_prices = read_asset_prices(asset_file)
    except EmptyDataError:
        return {} # Return an empty observation if the given file was empty

    try:
        # Choose an appropriate time for the observation
        asset_prices.Date = pd.to_datetime(asset_prices.Date)

        # We need data up to 1 year ago so the chosen data has to be at least
        # 1 year from the minimum date for this asset (+ 1 day for the time lag).
        # We also need at leas one month of data for the future returns for the
        # classification
        min_date = asset_prices.Date.min() + timedelta(days=365 + 1)
        max_date = asset_prices.Date.max() - timedelta(days=90)

        # Choose a start date for the observation
        start_date = rand.choice(
            asset_prices.Date[
                (asset_prices.Date >= min_date) &
                (asset_prices.Date < max_date)
            ]
        )
    except IndexError:
        # If the date selection failed because of an index error, the asset cannot
        # be used so we just return an empty dictionary
        return {}
    except KeyError:
        # If the date selection failed because of an index error, the asset cannot
        # be used so we just return an empty dictionary
        return {}

    try:
        observation = {
            "asset_file": asset_file,
            "date": start_date,
            "stock": 1 if "Stocks" in asset_file else 0,
            "1_month_return": calculate_return(asset_prices, start_date, ndays=30),
            "6_month_return": calculate_return(asset_prices, start_date, ndays=180),
            "12_month_return": calculate_return(asset_prices, start_date, ndays=365),
            "1_month_volatility": calculate_std(asset_prices, start_date, ndays=30),
            "6_month_volatility": calculate_std(asset_prices, start_date, ndays=180),
            "12_month_volatility": calculate_std(asset_prices, start_date, ndays=365),
            "1_month_img": plot_prices(asset_prices, start_date, ndays=30, img_path=img_prefix + "_" + "1_month.PNG"),
            "6_month_img": plot_prices(asset_prices, start_date, ndays=180, img_path=img_prefix + "_" + "6_month.PNG"),
            "12_month_img": plot_prices(asset_prices, start_date, ndays=365, img_path=img_prefix + "_" + "12_month.PNG"),
            "1_month_img_bar": plot_prices_area(asset_prices, start_date, ndays=30, img_path=img_prefix + "_bar_" + "1_month.PNG"),
            "6_month_img_bar": plot_prices_area(asset_prices, start_date, ndays=180, img_path=img_prefix + "_bar_" + "6_month.PNG"),
            "12_month_img_bar": plot_prices_area(asset_prices, start_date, ndays=365, img_path=img_prefix + "_bar_" + "12_month.PNG"),
            "label_1m": calculate_label(calculate_next_month_return(asset_prices, start_date)),
            "1m_return": calculate_next_month_return(asset_prices, start_date)
        }
    except IndexError:
        # If one of the time series did not have enough data to calculate the 1 month image this can occur
        return {}

    return observation

In [13]:
def read_batches(path):
    """
    Read the batches stored in the path and group them
    :param path: the path with batches
    :return: a pandas dataframe with the grouped and de-duplicated batches
    """
    # Regroup batches
    batches = [pd.read_csv(join(path, f)) for f in listdir(path)]

    # After creating the random observations, potential duplicates (although unlikely) need to be dropped
    observations = pd.concat(batches)
    observations.drop_duplicates(subset=["date", "asset_file"])

    return observations

In [14]:
def construct_many_rand_observations(n=10, start_from=0):
    """
    Create many different observations

    :param n: the number of observations to create
    :param start_from: the number of observations to start from (in case the script failed due to a memory error)
    :return: a dataframe with the newly created observations (and accompanying image files in a /Charts folder)
    """

    asset_list = get_all_assets()

    # Create necessary directories
    if not os.path.isdir("Charts"):
        os.makedirs("Charts")
    if not os.path.isdir("DataBatches"):
        os.makedirs("DataBatches")

    record_collector = [] # collect observations

    # Generate random observations
    for i in range(start_from, start_from + n):

        # For memory reasons the pre-processing happens in batches of 1000
        if i % 500 == 0 and i != start_from:
            # Store the observations created so far
            observations = pd.DataFrame.from_records(record_collector)
            batch_identifier = "DataBatches/" + "rand_observations_" + str(i) + ".csv"
            observations.to_csv(batch_identifier, index=False)

            record_collector = [] # clear the record collector

            gc.collect() # run the garbage collector every 1000 observations

        # Construct a random observation & save it
        obs = {}
        while len(obs) == 0: # create random observations until one of them is not empty
            obs = construct_rand_observation(asset_list, "Charts/img_" + str(i))

        record_collector.append(obs)

    # Store the final observations if they weren't stored before
    if (start_from + n-1) % 500 != 0:
        observations = pd.DataFrame.from_records(record_collector)
        batch_identifier = "DataBatches/" + "rand_observations_" + str(start_from + n-1) + ".csv"
        observations.to_csv(batch_identifier, index=False)

        record_collector = None # clear the record collector

        gc.collect() # run the garbage collector

    # Regroup batches
    return read_batches("DataBatches")

In [15]:
construct_many_rand_observations(3000)

Unnamed: 0,asset_file,date,stock,1_month_return,6_month_return,12_month_return,1_month_volatility,6_month_volatility,12_month_volatility,1_month_img,6_month_img,12_month_img,1_month_img_bar,6_month_img_bar,12_month_img_bar,label_1m,1m_return
0,Data/Stocks\ncr.us.txt,2004-04-05,1,0.000439,0.317341,1.388889,0.017516,0.016031,0.023119,Charts/img_500_1_month.PNG,Charts/img_500_6_month.PNG,Charts/img_500_12_month.PNG,Charts/img_500_bar_1_month.PNG,Charts/img_500_bar_6_month.PNG,Charts/img_500_bar_12_month.PNG,0,-0.013871
1,Data/Stocks\chfn.us.txt,2013-05-31,1,-0.001997,0.284303,0.399270,0.004603,0.009990,0.017845,Charts/img_501_1_month.PNG,Charts/img_501_6_month.PNG,Charts/img_501_12_month.PNG,Charts/img_501_bar_1_month.PNG,Charts/img_501_bar_6_month.PNG,Charts/img_501_bar_12_month.PNG,2,0.015084
2,Data/Stocks\tmst.us.txt,2016-01-05,1,-0.151767,-0.657503,-0.770941,0.054340,0.050027,0.039493,Charts/img_502_1_month.PNG,Charts/img_502_6_month.PNG,Charts/img_502_12_month.PNG,Charts/img_502_bar_1_month.PNG,Charts/img_502_bar_6_month.PNG,Charts/img_502_bar_12_month.PNG,0,-0.066185
3,Data/Stocks\gjp.us.txt,2012-08-16,1,-0.053945,-0.075769,0.000470,0.015497,0.013264,0.016023,Charts/img_503_1_month.PNG,Charts/img_503_6_month.PNG,Charts/img_503_12_month.PNG,Charts/img_503_bar_1_month.PNG,Charts/img_503_bar_6_month.PNG,Charts/img_503_bar_12_month.PNG,1,0.001246
4,Data/Stocks\hvt-a.us.txt,2011-09-27,1,-0.104041,-0.224486,-0.056672,0.040486,0.039216,0.033105,Charts/img_504_1_month.PNG,Charts/img_504_6_month.PNG,Charts/img_504_12_month.PNG,Charts/img_504_bar_1_month.PNG,Charts/img_504_bar_6_month.PNG,Charts/img_504_bar_12_month.PNG,2,0.127726
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Data/Stocks\tnp.us.txt,2015-08-05,1,-0.025786,0.298581,0.456090,0.025101,0.021265,0.026575,Charts/img_495_1_month.PNG,Charts/img_495_6_month.PNG,Charts/img_495_12_month.PNG,Charts/img_495_bar_1_month.PNG,Charts/img_495_bar_6_month.PNG,Charts/img_495_bar_12_month.PNG,0,-0.120906
496,Data/Stocks\srv.us.txt,2012-04-25,1,-0.086736,0.034882,-0.019129,0.010973,0.011807,0.020147,Charts/img_496_1_month.PNG,Charts/img_496_6_month.PNG,Charts/img_496_12_month.PNG,Charts/img_496_bar_1_month.PNG,Charts/img_496_bar_6_month.PNG,Charts/img_496_bar_12_month.PNG,0,-0.038409
497,Data/ETFs\ddm.us.txt,2013-04-16,0,0.011379,0.175601,0.308548,0.013038,0.013851,0.014707,Charts/img_497_1_month.PNG,Charts/img_497_6_month.PNG,Charts/img_497_12_month.PNG,Charts/img_497_bar_1_month.PNG,Charts/img_497_bar_6_month.PNG,Charts/img_497_bar_12_month.PNG,2,0.087341
498,Data/Stocks\sptn.us.txt,2015-11-30,1,-0.180311,-0.284739,-0.001140,0.035719,0.021547,0.018126,Charts/img_498_1_month.PNG,Charts/img_498_6_month.PNG,Charts/img_498_12_month.PNG,Charts/img_498_bar_1_month.PNG,Charts/img_498_bar_6_month.PNG,Charts/img_498_bar_12_month.PNG,1,0.006953


<Figure size 432x288 with 0 Axes>

In [16]:
construct_many_rand_observations(3000, 3000)

Unnamed: 0,asset_file,date,stock,1_month_return,6_month_return,12_month_return,1_month_volatility,6_month_volatility,12_month_volatility,1_month_img,6_month_img,12_month_img,1_month_img_bar,6_month_img_bar,12_month_img_bar,label_1m,1m_return
0,Data/Stocks\ncr.us.txt,2004-04-05,1,0.000439,0.317341,1.388889,0.017516,0.016031,0.023119,Charts/img_500_1_month.PNG,Charts/img_500_6_month.PNG,Charts/img_500_12_month.PNG,Charts/img_500_bar_1_month.PNG,Charts/img_500_bar_6_month.PNG,Charts/img_500_bar_12_month.PNG,0,-0.013871
1,Data/Stocks\chfn.us.txt,2013-05-31,1,-0.001997,0.284303,0.399270,0.004603,0.009990,0.017845,Charts/img_501_1_month.PNG,Charts/img_501_6_month.PNG,Charts/img_501_12_month.PNG,Charts/img_501_bar_1_month.PNG,Charts/img_501_bar_6_month.PNG,Charts/img_501_bar_12_month.PNG,2,0.015084
2,Data/Stocks\tmst.us.txt,2016-01-05,1,-0.151767,-0.657503,-0.770941,0.054340,0.050027,0.039493,Charts/img_502_1_month.PNG,Charts/img_502_6_month.PNG,Charts/img_502_12_month.PNG,Charts/img_502_bar_1_month.PNG,Charts/img_502_bar_6_month.PNG,Charts/img_502_bar_12_month.PNG,0,-0.066185
3,Data/Stocks\gjp.us.txt,2012-08-16,1,-0.053945,-0.075769,0.000470,0.015497,0.013264,0.016023,Charts/img_503_1_month.PNG,Charts/img_503_6_month.PNG,Charts/img_503_12_month.PNG,Charts/img_503_bar_1_month.PNG,Charts/img_503_bar_6_month.PNG,Charts/img_503_bar_12_month.PNG,1,0.001246
4,Data/Stocks\hvt-a.us.txt,2011-09-27,1,-0.104041,-0.224486,-0.056672,0.040486,0.039216,0.033105,Charts/img_504_1_month.PNG,Charts/img_504_6_month.PNG,Charts/img_504_12_month.PNG,Charts/img_504_bar_1_month.PNG,Charts/img_504_bar_6_month.PNG,Charts/img_504_bar_12_month.PNG,2,0.127726
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Data/Stocks\clro.us.txt,2012-10-04,1,-0.002551,-0.029221,-0.197224,0.011883,0.025439,0.027060,Charts/img_5995_1_month.PNG,Charts/img_5995_6_month.PNG,Charts/img_5995_12_month.PNG,Charts/img_5995_bar_1_month.PNG,Charts/img_5995_bar_6_month.PNG,Charts/img_5995_bar_12_month.PNG,2,0.086238
496,Data/ETFs\rwl.us.txt,2010-06-01,0,-0.071432,0.036265,0.201913,0.020085,0.011215,0.011805,Charts/img_5996_1_month.PNG,Charts/img_5996_6_month.PNG,Charts/img_5996_12_month.PNG,Charts/img_5996_bar_1_month.PNG,Charts/img_5996_bar_6_month.PNG,Charts/img_5996_bar_12_month.PNG,0,-0.057502
497,Data/ETFs\cape.us.txt,2016-11-28,0,0.052661,0.111960,0.159411,0.006238,0.007575,0.008513,Charts/img_5997_1_month.PNG,Charts/img_5997_6_month.PNG,Charts/img_5997_12_month.PNG,Charts/img_5997_bar_1_month.PNG,Charts/img_5997_bar_6_month.PNG,Charts/img_5997_bar_12_month.PNG,2,0.027056
498,Data/Stocks\hnna.us.txt,2015-08-26,1,0.254485,0.058200,0.389687,0.028195,0.024082,0.027321,Charts/img_5998_1_month.PNG,Charts/img_5998_6_month.PNG,Charts/img_5998_12_month.PNG,Charts/img_5998_bar_1_month.PNG,Charts/img_5998_bar_6_month.PNG,Charts/img_5998_bar_12_month.PNG,0,-0.020751


<Figure size 432x288 with 0 Axes>

In [17]:
construct_many_rand_observations(3000, 6000)

Unnamed: 0,asset_file,date,stock,1_month_return,6_month_return,12_month_return,1_month_volatility,6_month_volatility,12_month_volatility,1_month_img,6_month_img,12_month_img,1_month_img_bar,6_month_img_bar,12_month_img_bar,label_1m,1m_return
0,Data/Stocks\ncr.us.txt,2004-04-05,1,0.000439,0.317341,1.388889,0.017516,0.016031,0.023119,Charts/img_500_1_month.PNG,Charts/img_500_6_month.PNG,Charts/img_500_12_month.PNG,Charts/img_500_bar_1_month.PNG,Charts/img_500_bar_6_month.PNG,Charts/img_500_bar_12_month.PNG,0,-0.013871
1,Data/Stocks\chfn.us.txt,2013-05-31,1,-0.001997,0.284303,0.399270,0.004603,0.009990,0.017845,Charts/img_501_1_month.PNG,Charts/img_501_6_month.PNG,Charts/img_501_12_month.PNG,Charts/img_501_bar_1_month.PNG,Charts/img_501_bar_6_month.PNG,Charts/img_501_bar_12_month.PNG,2,0.015084
2,Data/Stocks\tmst.us.txt,2016-01-05,1,-0.151767,-0.657503,-0.770941,0.054340,0.050027,0.039493,Charts/img_502_1_month.PNG,Charts/img_502_6_month.PNG,Charts/img_502_12_month.PNG,Charts/img_502_bar_1_month.PNG,Charts/img_502_bar_6_month.PNG,Charts/img_502_bar_12_month.PNG,0,-0.066185
3,Data/Stocks\gjp.us.txt,2012-08-16,1,-0.053945,-0.075769,0.000470,0.015497,0.013264,0.016023,Charts/img_503_1_month.PNG,Charts/img_503_6_month.PNG,Charts/img_503_12_month.PNG,Charts/img_503_bar_1_month.PNG,Charts/img_503_bar_6_month.PNG,Charts/img_503_bar_12_month.PNG,1,0.001246
4,Data/Stocks\hvt-a.us.txt,2011-09-27,1,-0.104041,-0.224486,-0.056672,0.040486,0.039216,0.033105,Charts/img_504_1_month.PNG,Charts/img_504_6_month.PNG,Charts/img_504_12_month.PNG,Charts/img_504_bar_1_month.PNG,Charts/img_504_bar_6_month.PNG,Charts/img_504_bar_12_month.PNG,2,0.127726
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Data/Stocks\hpi.us.txt,2006-07-03,1,0.004950,0.032584,-0.029432,0.004578,0.005252,0.005911,Charts/img_8995_1_month.PNG,Charts/img_8995_6_month.PNG,Charts/img_8995_12_month.PNG,Charts/img_8995_bar_1_month.PNG,Charts/img_8995_bar_6_month.PNG,Charts/img_8995_bar_12_month.PNG,2,0.041791
496,Data/Stocks\srdx.us.txt,2015-02-02,1,0.036635,0.096126,-0.059885,0.018967,0.015438,0.016722,Charts/img_8996_1_month.PNG,Charts/img_8996_6_month.PNG,Charts/img_8996_12_month.PNG,Charts/img_8996_bar_1_month.PNG,Charts/img_8996_bar_6_month.PNG,Charts/img_8996_bar_12_month.PNG,2,0.039059
497,Data/Stocks\tgt.us.txt,1999-01-22,1,0.206011,0.215111,0.651367,0.022833,0.037480,0.029076,Charts/img_8997_1_month.PNG,Charts/img_8997_6_month.PNG,Charts/img_8997_12_month.PNG,Charts/img_8997_bar_1_month.PNG,Charts/img_8997_bar_6_month.PNG,Charts/img_8997_bar_12_month.PNG,2,0.059405
498,Data/Stocks\sncr.us.txt,2016-03-07,1,0.203871,-0.256612,-0.355458,0.023538,0.032191,0.028493,Charts/img_8998_1_month.PNG,Charts/img_8998_6_month.PNG,Charts/img_8998_12_month.PNG,Charts/img_8998_bar_1_month.PNG,Charts/img_8998_bar_6_month.PNG,Charts/img_8998_bar_12_month.PNG,2,0.071477


<Figure size 432x288 with 0 Axes>

In [18]:
construct_many_rand_observations(3000, 9000)

Unnamed: 0,asset_file,date,stock,1_month_return,6_month_return,12_month_return,1_month_volatility,6_month_volatility,12_month_volatility,1_month_img,6_month_img,12_month_img,1_month_img_bar,6_month_img_bar,12_month_img_bar,label_1m,1m_return
0,Data/Stocks\ncr.us.txt,2004-04-05,1,0.000439,0.317341,1.388889,0.017516,0.016031,0.023119,Charts/img_500_1_month.PNG,Charts/img_500_6_month.PNG,Charts/img_500_12_month.PNG,Charts/img_500_bar_1_month.PNG,Charts/img_500_bar_6_month.PNG,Charts/img_500_bar_12_month.PNG,0,-0.013871
1,Data/Stocks\chfn.us.txt,2013-05-31,1,-0.001997,0.284303,0.399270,0.004603,0.009990,0.017845,Charts/img_501_1_month.PNG,Charts/img_501_6_month.PNG,Charts/img_501_12_month.PNG,Charts/img_501_bar_1_month.PNG,Charts/img_501_bar_6_month.PNG,Charts/img_501_bar_12_month.PNG,2,0.015084
2,Data/Stocks\tmst.us.txt,2016-01-05,1,-0.151767,-0.657503,-0.770941,0.054340,0.050027,0.039493,Charts/img_502_1_month.PNG,Charts/img_502_6_month.PNG,Charts/img_502_12_month.PNG,Charts/img_502_bar_1_month.PNG,Charts/img_502_bar_6_month.PNG,Charts/img_502_bar_12_month.PNG,0,-0.066185
3,Data/Stocks\gjp.us.txt,2012-08-16,1,-0.053945,-0.075769,0.000470,0.015497,0.013264,0.016023,Charts/img_503_1_month.PNG,Charts/img_503_6_month.PNG,Charts/img_503_12_month.PNG,Charts/img_503_bar_1_month.PNG,Charts/img_503_bar_6_month.PNG,Charts/img_503_bar_12_month.PNG,1,0.001246
4,Data/Stocks\hvt-a.us.txt,2011-09-27,1,-0.104041,-0.224486,-0.056672,0.040486,0.039216,0.033105,Charts/img_504_1_month.PNG,Charts/img_504_6_month.PNG,Charts/img_504_12_month.PNG,Charts/img_504_bar_1_month.PNG,Charts/img_504_bar_6_month.PNG,Charts/img_504_bar_12_month.PNG,2,0.127726
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Data/Stocks\sfl.us.txt,2008-01-24,1,-0.157712,-0.122881,0.014062,0.018766,0.024021,0.020691,Charts/img_9495_1_month.PNG,Charts/img_9495_6_month.PNG,Charts/img_9495_12_month.PNG,Charts/img_9495_bar_1_month.PNG,Charts/img_9495_bar_6_month.PNG,Charts/img_9495_bar_12_month.PNG,2,0.104579
496,Data/Stocks\belfa.us.txt,2010-08-18,1,0.201255,0.056782,0.345961,0.051319,0.035508,0.033319,Charts/img_9496_1_month.PNG,Charts/img_9496_6_month.PNG,Charts/img_9496_12_month.PNG,Charts/img_9496_bar_1_month.PNG,Charts/img_9496_bar_6_month.PNG,Charts/img_9496_bar_12_month.PNG,2,0.012205
497,Data/Stocks\evhc.us.txt,2014-09-02,1,0.047564,0.074978,0.393275,0.008937,0.015356,0.017374,Charts/img_9497_1_month.PNG,Charts/img_9497_6_month.PNG,Charts/img_9497_12_month.PNG,Charts/img_9497_bar_1_month.PNG,Charts/img_9497_bar_6_month.PNG,Charts/img_9497_bar_12_month.PNG,0,-0.047149
498,Data/Stocks\snp.us.txt,2010-07-21,1,-0.045617,-0.006414,-0.041886,0.014021,0.017807,0.019190,Charts/img_9498_1_month.PNG,Charts/img_9498_6_month.PNG,Charts/img_9498_12_month.PNG,Charts/img_9498_bar_1_month.PNG,Charts/img_9498_bar_6_month.PNG,Charts/img_9498_bar_12_month.PNG,2,0.040489


<Figure size 432x288 with 0 Axes>

In [19]:
construct_many_rand_observations(3000, 12000)

Unnamed: 0,asset_file,date,stock,1_month_return,6_month_return,12_month_return,1_month_volatility,6_month_volatility,12_month_volatility,1_month_img,6_month_img,12_month_img,1_month_img_bar,6_month_img_bar,12_month_img_bar,label_1m,1m_return
0,Data/Stocks\ncr.us.txt,2004-04-05,1,0.000439,0.317341,1.388889,0.017516,0.016031,0.023119,Charts/img_500_1_month.PNG,Charts/img_500_6_month.PNG,Charts/img_500_12_month.PNG,Charts/img_500_bar_1_month.PNG,Charts/img_500_bar_6_month.PNG,Charts/img_500_bar_12_month.PNG,0,-0.013871
1,Data/Stocks\chfn.us.txt,2013-05-31,1,-0.001997,0.284303,0.399270,0.004603,0.009990,0.017845,Charts/img_501_1_month.PNG,Charts/img_501_6_month.PNG,Charts/img_501_12_month.PNG,Charts/img_501_bar_1_month.PNG,Charts/img_501_bar_6_month.PNG,Charts/img_501_bar_12_month.PNG,2,0.015084
2,Data/Stocks\tmst.us.txt,2016-01-05,1,-0.151767,-0.657503,-0.770941,0.054340,0.050027,0.039493,Charts/img_502_1_month.PNG,Charts/img_502_6_month.PNG,Charts/img_502_12_month.PNG,Charts/img_502_bar_1_month.PNG,Charts/img_502_bar_6_month.PNG,Charts/img_502_bar_12_month.PNG,0,-0.066185
3,Data/Stocks\gjp.us.txt,2012-08-16,1,-0.053945,-0.075769,0.000470,0.015497,0.013264,0.016023,Charts/img_503_1_month.PNG,Charts/img_503_6_month.PNG,Charts/img_503_12_month.PNG,Charts/img_503_bar_1_month.PNG,Charts/img_503_bar_6_month.PNG,Charts/img_503_bar_12_month.PNG,1,0.001246
4,Data/Stocks\hvt-a.us.txt,2011-09-27,1,-0.104041,-0.224486,-0.056672,0.040486,0.039216,0.033105,Charts/img_504_1_month.PNG,Charts/img_504_6_month.PNG,Charts/img_504_12_month.PNG,Charts/img_504_bar_1_month.PNG,Charts/img_504_bar_6_month.PNG,Charts/img_504_bar_12_month.PNG,2,0.127726
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Data/Stocks\sfl.us.txt,2008-01-24,1,-0.157712,-0.122881,0.014062,0.018766,0.024021,0.020691,Charts/img_9495_1_month.PNG,Charts/img_9495_6_month.PNG,Charts/img_9495_12_month.PNG,Charts/img_9495_bar_1_month.PNG,Charts/img_9495_bar_6_month.PNG,Charts/img_9495_bar_12_month.PNG,2,0.104579
496,Data/Stocks\belfa.us.txt,2010-08-18,1,0.201255,0.056782,0.345961,0.051319,0.035508,0.033319,Charts/img_9496_1_month.PNG,Charts/img_9496_6_month.PNG,Charts/img_9496_12_month.PNG,Charts/img_9496_bar_1_month.PNG,Charts/img_9496_bar_6_month.PNG,Charts/img_9496_bar_12_month.PNG,2,0.012205
497,Data/Stocks\evhc.us.txt,2014-09-02,1,0.047564,0.074978,0.393275,0.008937,0.015356,0.017374,Charts/img_9497_1_month.PNG,Charts/img_9497_6_month.PNG,Charts/img_9497_12_month.PNG,Charts/img_9497_bar_1_month.PNG,Charts/img_9497_bar_6_month.PNG,Charts/img_9497_bar_12_month.PNG,0,-0.047149
498,Data/Stocks\snp.us.txt,2010-07-21,1,-0.045617,-0.006414,-0.041886,0.014021,0.017807,0.019190,Charts/img_9498_1_month.PNG,Charts/img_9498_6_month.PNG,Charts/img_9498_12_month.PNG,Charts/img_9498_bar_1_month.PNG,Charts/img_9498_bar_6_month.PNG,Charts/img_9498_bar_12_month.PNG,2,0.040489


<Figure size 432x288 with 0 Axes>

In [20]:
construct_many_rand_observations(3000, 15000)

Unnamed: 0,asset_file,date,stock,1_month_return,6_month_return,12_month_return,1_month_volatility,6_month_volatility,12_month_volatility,1_month_img,6_month_img,12_month_img,1_month_img_bar,6_month_img_bar,12_month_img_bar,label_1m,1m_return
0,Data/Stocks\ncr.us.txt,2004-04-05,1,0.000439,0.317341,1.388889,0.017516,0.016031,0.023119,Charts/img_500_1_month.PNG,Charts/img_500_6_month.PNG,Charts/img_500_12_month.PNG,Charts/img_500_bar_1_month.PNG,Charts/img_500_bar_6_month.PNG,Charts/img_500_bar_12_month.PNG,0,-0.013871
1,Data/Stocks\chfn.us.txt,2013-05-31,1,-0.001997,0.284303,0.399270,0.004603,0.009990,0.017845,Charts/img_501_1_month.PNG,Charts/img_501_6_month.PNG,Charts/img_501_12_month.PNG,Charts/img_501_bar_1_month.PNG,Charts/img_501_bar_6_month.PNG,Charts/img_501_bar_12_month.PNG,2,0.015084
2,Data/Stocks\tmst.us.txt,2016-01-05,1,-0.151767,-0.657503,-0.770941,0.054340,0.050027,0.039493,Charts/img_502_1_month.PNG,Charts/img_502_6_month.PNG,Charts/img_502_12_month.PNG,Charts/img_502_bar_1_month.PNG,Charts/img_502_bar_6_month.PNG,Charts/img_502_bar_12_month.PNG,0,-0.066185
3,Data/Stocks\gjp.us.txt,2012-08-16,1,-0.053945,-0.075769,0.000470,0.015497,0.013264,0.016023,Charts/img_503_1_month.PNG,Charts/img_503_6_month.PNG,Charts/img_503_12_month.PNG,Charts/img_503_bar_1_month.PNG,Charts/img_503_bar_6_month.PNG,Charts/img_503_bar_12_month.PNG,1,0.001246
4,Data/Stocks\hvt-a.us.txt,2011-09-27,1,-0.104041,-0.224486,-0.056672,0.040486,0.039216,0.033105,Charts/img_504_1_month.PNG,Charts/img_504_6_month.PNG,Charts/img_504_12_month.PNG,Charts/img_504_bar_1_month.PNG,Charts/img_504_bar_6_month.PNG,Charts/img_504_bar_12_month.PNG,2,0.127726
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Data/Stocks\sfl.us.txt,2008-01-24,1,-0.157712,-0.122881,0.014062,0.018766,0.024021,0.020691,Charts/img_9495_1_month.PNG,Charts/img_9495_6_month.PNG,Charts/img_9495_12_month.PNG,Charts/img_9495_bar_1_month.PNG,Charts/img_9495_bar_6_month.PNG,Charts/img_9495_bar_12_month.PNG,2,0.104579
496,Data/Stocks\belfa.us.txt,2010-08-18,1,0.201255,0.056782,0.345961,0.051319,0.035508,0.033319,Charts/img_9496_1_month.PNG,Charts/img_9496_6_month.PNG,Charts/img_9496_12_month.PNG,Charts/img_9496_bar_1_month.PNG,Charts/img_9496_bar_6_month.PNG,Charts/img_9496_bar_12_month.PNG,2,0.012205
497,Data/Stocks\evhc.us.txt,2014-09-02,1,0.047564,0.074978,0.393275,0.008937,0.015356,0.017374,Charts/img_9497_1_month.PNG,Charts/img_9497_6_month.PNG,Charts/img_9497_12_month.PNG,Charts/img_9497_bar_1_month.PNG,Charts/img_9497_bar_6_month.PNG,Charts/img_9497_bar_12_month.PNG,0,-0.047149
498,Data/Stocks\snp.us.txt,2010-07-21,1,-0.045617,-0.006414,-0.041886,0.014021,0.017807,0.019190,Charts/img_9498_1_month.PNG,Charts/img_9498_6_month.PNG,Charts/img_9498_12_month.PNG,Charts/img_9498_bar_1_month.PNG,Charts/img_9498_bar_6_month.PNG,Charts/img_9498_bar_12_month.PNG,2,0.040489


<Figure size 432x288 with 0 Axes>

In [21]:
construct_many_rand_observations(3000, 18000)

Unnamed: 0,asset_file,date,stock,1_month_return,6_month_return,12_month_return,1_month_volatility,6_month_volatility,12_month_volatility,1_month_img,6_month_img,12_month_img,1_month_img_bar,6_month_img_bar,12_month_img_bar,label_1m,1m_return
0,Data/Stocks\ncr.us.txt,2004-04-05,1,0.000439,0.317341,1.388889,0.017516,0.016031,0.023119,Charts/img_500_1_month.PNG,Charts/img_500_6_month.PNG,Charts/img_500_12_month.PNG,Charts/img_500_bar_1_month.PNG,Charts/img_500_bar_6_month.PNG,Charts/img_500_bar_12_month.PNG,0,-0.013871
1,Data/Stocks\chfn.us.txt,2013-05-31,1,-0.001997,0.284303,0.399270,0.004603,0.009990,0.017845,Charts/img_501_1_month.PNG,Charts/img_501_6_month.PNG,Charts/img_501_12_month.PNG,Charts/img_501_bar_1_month.PNG,Charts/img_501_bar_6_month.PNG,Charts/img_501_bar_12_month.PNG,2,0.015084
2,Data/Stocks\tmst.us.txt,2016-01-05,1,-0.151767,-0.657503,-0.770941,0.054340,0.050027,0.039493,Charts/img_502_1_month.PNG,Charts/img_502_6_month.PNG,Charts/img_502_12_month.PNG,Charts/img_502_bar_1_month.PNG,Charts/img_502_bar_6_month.PNG,Charts/img_502_bar_12_month.PNG,0,-0.066185
3,Data/Stocks\gjp.us.txt,2012-08-16,1,-0.053945,-0.075769,0.000470,0.015497,0.013264,0.016023,Charts/img_503_1_month.PNG,Charts/img_503_6_month.PNG,Charts/img_503_12_month.PNG,Charts/img_503_bar_1_month.PNG,Charts/img_503_bar_6_month.PNG,Charts/img_503_bar_12_month.PNG,1,0.001246
4,Data/Stocks\hvt-a.us.txt,2011-09-27,1,-0.104041,-0.224486,-0.056672,0.040486,0.039216,0.033105,Charts/img_504_1_month.PNG,Charts/img_504_6_month.PNG,Charts/img_504_12_month.PNG,Charts/img_504_bar_1_month.PNG,Charts/img_504_bar_6_month.PNG,Charts/img_504_bar_12_month.PNG,2,0.127726
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Data/Stocks\sfl.us.txt,2008-01-24,1,-0.157712,-0.122881,0.014062,0.018766,0.024021,0.020691,Charts/img_9495_1_month.PNG,Charts/img_9495_6_month.PNG,Charts/img_9495_12_month.PNG,Charts/img_9495_bar_1_month.PNG,Charts/img_9495_bar_6_month.PNG,Charts/img_9495_bar_12_month.PNG,2,0.104579
496,Data/Stocks\belfa.us.txt,2010-08-18,1,0.201255,0.056782,0.345961,0.051319,0.035508,0.033319,Charts/img_9496_1_month.PNG,Charts/img_9496_6_month.PNG,Charts/img_9496_12_month.PNG,Charts/img_9496_bar_1_month.PNG,Charts/img_9496_bar_6_month.PNG,Charts/img_9496_bar_12_month.PNG,2,0.012205
497,Data/Stocks\evhc.us.txt,2014-09-02,1,0.047564,0.074978,0.393275,0.008937,0.015356,0.017374,Charts/img_9497_1_month.PNG,Charts/img_9497_6_month.PNG,Charts/img_9497_12_month.PNG,Charts/img_9497_bar_1_month.PNG,Charts/img_9497_bar_6_month.PNG,Charts/img_9497_bar_12_month.PNG,0,-0.047149
498,Data/Stocks\snp.us.txt,2010-07-21,1,-0.045617,-0.006414,-0.041886,0.014021,0.017807,0.019190,Charts/img_9498_1_month.PNG,Charts/img_9498_6_month.PNG,Charts/img_9498_12_month.PNG,Charts/img_9498_bar_1_month.PNG,Charts/img_9498_bar_6_month.PNG,Charts/img_9498_bar_12_month.PNG,2,0.040489


<Figure size 432x288 with 0 Axes>