# Colin Lefter

## Research question/interests

**What equity data is the most deterministic of the price of an equity such that we can compute an optimized portfolio of equities while using user input to drive our optimization algorithm?**

My research objective is to develop a scalable asset allocation and construction algorithm that implements an objected-oriented design approach. This objective is an outcome of determining what equity data is the most deterministic of the price of an equity, which will be the focus for the majority of the project/

I intend to develop algorithms for constructing multiple linear regressions and Fourier Franforms, among others, that I will then use to construct interactive and statistical models with Plotly and Seaborn. As such, I have a strong interest in the system design of our software and in developing helper functions that can assist all of us with processing data more efficiently. I am also looking forward to using Facebook Prophet[^1] to construct a time series forecast of a sample portfolio recommendation from our software, which can be included in our Tableau Dashboard.

### Analysis Plan
Our objective function is one that takes in a selection of columns from our data sets to then search for the top n companies that satisfy a criteria for having the highest probability of producing an optimal return on investment. These inputs themselves refer to sub-objective functions that take as input user-defined parameters and thresholds that set the criteria for favourable performance attributes. To rank the companies from our data set, and ultimately determine what portion of capital to assign to each equity, I propose a data normalization algorithm that normalizes the data that comprises the favourable subset from each column of our data set. We interpret these normalized values as probabilities of equity selection and ultimately average the score of each company across all columns to then multiply the final score percentage of each company with the total capital specified by the user. In a broad sense, our software is composed of four general classes that include "Data", "Quantitative Analysis", "Data Visualization" and "Portfolio Construction". We inherit the properties from each of these classes to build a functional data analysis chain.

Our data visualization will be concerned with analyzing the influence of certain financial variables, such as Price-to-Earnings, on the price of each equity from a sample of 500 equities (from the S&P 500 index). Such analysis would begin with a statistical summary that will constitute exploratory data analysis, followed by our application of analysis algorithms that we design. The construction of a portfolio is a bonus of our project and will be made possible by the analysis algorithms we have constructed.

**Important Note**
A component of the analysis will involve the comparison of different values of financial variables with the corresponding price of each equity. This constitutes inferential analysis as we are attempting to identify a correlation on the basis of picking stocks based on expected performance. Therefore, this will require us to use past financial data and compare this data with the current price of each equity. As a result, we can only use the 3-month performance data (i.e. 3-month change in share price data) for this comparison as otherwise we would be using future data to predict past performance, which would be invalid.

#### User-defined parameters
Some initial ideas for these parameters include:
- (float) Initial capital
- (float) Additional capital per day, week or month
- (int) Intended holding period (in days)
- (boolean) Importance of dividends (validated based on capital invested)
- (String) Preferred industries (choose from a list, or select all)
- (int) Volatility tolerance (from 0 to 1, 1 indicating that volatility is not important)
- (String) Preferred companies (as a list)[^2]
- (int) Preferred degree of portfolio diversification (from 0 to 1, 1 indicating complete diversification)
- (String) Preferred investment strategy (choose from "Growth", "Value", "GARP")

### Algorithm Plan

####  Tier 1: Threshold-based screening algorithms
- The current plan is to use these algorithms to screen the financial documents from each company by setting a minimum threshold for each financial ratio. This class of algorithms will need to conduct such screening per industry as industry financial ratios are dinstinct from one another.
- A global screening algorithm that selects companies which show favourable performance across all ratios can also be used after each ratio has been individually tested.

#### Tier 2: Regression models
- As of now, the intent is to develop a multiple linear regression model that will attempt to determine a relationship between the yearly and quarterly performance of each company in relation to several columns of data that act as predictors. This can essentially implement the results from the threshold-based screening algorithms to only conduct this analysis on the pre-screened companies.

#### Tier 3: Statistical modelling algorithms
- Tier 3 denotes a class of broadly experimental statistical modelling algorithms that are applied on a pre-final portfolio to add additional points to companies that perform exceptionally well compared to others in the portfolio. For now, these algorithms constitute signal processing algorithms such as a Fourier Transform algorithm that attempts to identify peaks in numerical values that would otherwise not be apparent when examined in isolation and without further processing. Therefore, these algorithms will be used to fine-tune the capital allocation percentages for each company in the pre-final portfolio.

#### Columns of relevance
Data set 1: Overview
- Price
- MKT Cap
- P/E
- EPS
- Sector

Data set 2: Performance
- 1M change (1 month change)
- 3-Month performance
- 6-month perfromance
- YTD performance
- Yearly performance
- Volatility

Data set 3: Valuation
- Price / revenue
- Enterprise value

Data set 4: Dividends
- Dividend yield FWD
- Dividends per share (FY)

Data set 5: Margins
- Gross profit margin
- Operating margin
- Net profit margin

Data set 6: Income Statement
- Gross profit
- Income
- Net cash flow

Data set 7: Balance Sheet
- Current ratio
- Debt/equity
- Quick ratio

The total number of columns would be 24 in this case.

[^1]: This would mean that a few time series data sets would need to be downloaded from TradingView at the end of the project to test the demo porfolio.

[^2]: A helper function can be developed for this, where the user can just type out the name of the company and the ticker is identifed.

In [None]:
import pandas as pd
import plotly as plt
import seaborn as sns
import numpy as np
import datetime as dt
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing
import plotly.graph_objects as go
import plotly.express as px
from IPython.display import display, HTML, Markdown, Latex
from prophet import Prophet
from prophet.plot import plot_plotly, plot_components_plotly
from tqdm import tqdm, trange

# these variables will be updated to reflect the processed data at a later date
balance_sheet_df = pd.read_csv("../data/raw/us_equities_tradingview_data_balance_sheet.csv")
dividends_df = pd.read_csv("../data/raw/us_equities_tradingview_data_dividends.csv")
income_statement_df = pd.read_csv("../data/raw/us_equities_tradingview_data_income_statement.csv")
margins_df = pd.read_csv("../data/raw/us_equities_tradingview_data_margins.csv")
overview_df = pd.read_csv("../data/raw/us_equities_tradingview_data_overview.csv")
performance_df = pd.read_csv("../data/raw/us_equities_tradingview_data_performance.csv")
valuation_df = pd.read_csv("../data/raw/us_equities_tradingview_data_valuation.csv")

class QuantitativeAnalysis:
    def __init__(self, initial_capital: float=100000, capital_per_period: float=100, period: int=7, dividends_importance: bool=False, preferred_industries: str="Technology Services, Electronic Technology",
                volatility_tolerance: int=0.7, preferred_companies: str="Apple, Google, Microsoft, Amazon", diversification: int=0.4, investment_strategy: str="GARP"):
        """Includes several analysis functions that process select data across all data sets
        
        Example:
            QuantitativeAnalysis(
                initial_capital=100000,  capital_per_period=100, period=7, dividends_importance=False, preferred_industries="Technology Services, Electronic Technology",
                volatility_tolerance=0.7, preferred_companies="Apple, Google, Microsoft, Amazon", diversification=0.4, investment_strategy="GARP")
                )
        Args:
            initial_capital (float): _description_
            capital_per_period (float): _description_
            period (int): _description_
            dividends_importance (bool): _description_
            preferred_industries (str): _description_
            volatility_tolerance (int): _description_
            preferred_companies (str): _description_
            diversification (int): _description_
            investment_strategy (str): _description_
            
        Todo:
            - Finalize framework
            - Finalize docstrings
        """
        
        self.initial_capital = initial_capital
        self.capital_per_period = capital_per_period
        self.period = period
        self.dividends_importance = dividends_importance
        self.preferred_industries = preferred_industries
        self.volatility_tolerance = volatility_tolerance
        self.preferred_companies = preferred_companies
        self.diversification = diversification
        self.preferred_companies = preferred_industries
        self.investment_strategy = investment_strategy
        
    def multiple_linear_regression(self):
        pass

    def fourier_transform(self):
        pass
    
    def rank(self, dataframe, column: str, threshold: float):
        #NOTE: need to filter out outliers
        #NOTE: need to deal with NAN values and remove/fill them
        self.df = dataframe[dataframe[column] > threshold][column]
        self.y = np.array(self.df).reshape(-1, 1)
        self.y = preprocessing.MinMaxScaler().fit_transform(self.y)
        self.nan_values = len(dataframe) - len(self.df)
        
        for i in range(self.nan_values):
            self.y = np.append(self.y, np.NaN)
            
        dataframe[column + " Score"] = self.y
    
    def time_series_forecast(self):
        pass

class DataVisualization(QuantitativeAnalysis):
    def __init__(self):
        pass
    
    def score_distribution(self):
        pass
    
class PortfolioConstruction(DataVisualization, QuantitativeAnalysis):
    def __init__(self):
        pass
    
    def asset_allocation(self):
        pass
    
    def construct_portfolio(self):
        pass
"""self.df = dataframe[dataframe[column] > threshold][column]
self.y = np.array(self.df).reshape(-1, 1)
self.y = preprocessing.MinMaxScaler().fit_transform(self.y)

self.nan_values = len(dataframe) - len(self.df)
for i in range(self.nan_values):
    np.append(self.y, np.NaN)"""

'self.df = dataframe[dataframe[column] > threshold][column]\nself.y = np.array(self.df).reshape(-1, 1)\nself.y = preprocessing.MinMaxScaler().fit_transform(self.y)\n\nself.nan_values = len(dataframe) - len(self.df)\nfor i in range(self.nan_values):\n    np.append(self.y, np.NaN)'

In [None]:
test = QuantitativeAnalysis().rank(overview_df, "Price", 30)
overview_df.head(20)

Unnamed: 0,Ticker,Description,Price,Change %,Change,Technical Rating,Volume,Volume*Price,Market Capitalization,Price to Earnings Ratio (TTM),Basic EPS (TTM),Number of Employees,Sector,Free Cash Flow (Annual YoY Growth),Free Cash Flow Margin (FY),Free Cash Flow (Quarterly YoY Growth),Price Score
0,AAPL,Apple Inc.,143.0,10.059263,13.07,Neutral,1377778266,197022300000.0,2264578000000.0,23.914718,6.1445,164000.0,Electronic Technology,19.891773,28.261498,22.742534,0.000243
1,MSFT,Microsoft Corporation,242.71,1.20507,2.89,Neutral,639626996,155243900000.0,1806686000000.0,27.591728,9.0309,221000.0,Technology Services,16.092876,32.858728,-43.134068,0.000457
2,GOOG,Alphabet Inc.,97.95,10.391074,9.22,Neutral,503871113,49354180000.0,1258830000000.0,20.266028,5.0893,156500.0,Technology Services,56.41295,26.025291,-14.11859,0.000146
3,AMZN,"Amazon.com, Inc.",100.55,19.702381,16.55,Neutral,1457271131,146528600000.0,1025776000000.0,93.884298,1.1147,1608000.0,Retail Trade,-156.804505,-3.134379,41.031417,0.000152
4,BRK.A,Berkshire Hathaway Inc.,465039.98,-0.783208,-3670.98,Buy,79667,37048340000.0,675833500000.0,,-1194.0,372000.0,Finance,-2.301857,9.469601,-10.582135,1.0
5,TSLA,"Tesla, Inc.",166.66,35.297938,43.48,Sell,3700685851,616756300000.0,526271000000.0,54.966785,3.6018,99290.0,Consumer Durables,83.806713,9.258124,149.394856,0.000294
6,V,Visa Inc.,229.1,10.271467,21.34,Strong Buy,116325167,26650100000.0,471816300000.0,33.027942,7.1988,26500.0,Commercial Services,23.116651,60.999659,-3.375216,0.000428
7,NVDA,NVIDIA Corporation,191.62,31.120843,45.48,Buy,895694156,171632900000.0,471385200000.0,86.637454,2.3817,22473.0,Electronic Technology,73.242437,30.214758,-110.639938,0.000348
8,XOM,Exxon Mobil Corporation,113.56,2.955576,3.26,Buy,301984343,34293340000.0,467673400000.0,9.419097,12.274,63000.0,Energy Minerals,1479.227238,12.852661,111.317695,0.00018
9,UNH,UnitedHealth Group Incorporated,485.79,-8.372628,-44.39,Buy,75442815,36649370000.0,453897400000.0,22.943878,,,Health Services,17.673086,7.219847,-313.99128,0.00098
