# Project: Geometric Brownian Motion (GBM) Models and Stylized Facts
Geometric Brownian motion (GBM) is widely used as a pricing model. However, whether GBM replicates many of the statistical properties of actual pricing and return data is unclear. 
* These properties, referred to as _stylized facts_ have been observed for decades, dating back to early work by [Mandelbrot](https://en.wikipedia.org/wiki/Benoit_Mandelbrot) and later in several studies by [Rama Cont](http://rama.cont.perso.math.cnrs.fr/pdf/empirical.pdf) and more recently by [Ratliff-Crain et al.](https://arxiv.org/abs/2311.07738) who reviewed the 11 original stylized facts proposed by [Cont](http://rama.cont.perso.math.cnrs.fr/pdf/empirical.pdf) with newer data.

## Learning objectives
In this project, students will examine a few of the statistical properties (stylized facts) of return data and explore how well geometric Brownian motion models replicate these properties. 

* __Prerequisite__: Load and clean the historical dataset. The data we'll explore is daily open-high-low-close values for firms in the [S&P500 index](https://en.wikipedia.org/wiki/S%26P_500) between `01-03-2018` and `12-29-2023`.
* __Objective 1__: Are the returns in dataset $\mathcal{D}$ actually Laplace distributed?
    * `TODO`: Estimate the return data for firms in dataset $\mathcal{D}$
    * `TODO`: Classify the returns of firm $i$ as $c_{i}\in\left\{\text{normal},\text{laplace},\text{undefined}\right\}$
* __Objective 2__: Does geometric Brownian motion replicate common stylized facts?

## Setup
We set up the computational environment by including the `Include.jl` file. The `Include.jl` file loads external packages, various functions we will use in the exercise, and custom types to model the components of our example problem.

In [1]:
include("Include.jl");

[32m[1m    Updating[22m[39m git-repo `https://github.com/varnerlab/VLQuantitativeFinancePackage.jl.git`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-130-eCornell-Repository/courses/CHEME-132/module-2/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-130-eCornell-Repository/courses/CHEME-132/module-2/Manifest.toml`
[32m[1m  Activating[22m[39m project at `~/Desktop/julia_work/CHEME-130-eCornell-Repository/courses/CHEME-132/module-2`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-130-eCornell-Repository/courses/CHEME-132/module-2/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-130-eCornell-Repository/courses/CHEME-132/module-2/Manifest.toml`
[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General.toml`
[32m[1m    Updating[22m[39m git-repo `https://github.com/varnerlab/VLQuantitativeFinancePackage.jl.git`
[32m[1m  No Ch

## Prerequisites: Load and clean the historical dataset
We gathered a daily open-high-low-close `dataset` for each firm in the [S&P500](https://en.wikipedia.org/wiki/S%26P_500) from `01-03-2018` until `12-29-2023`, along with data for a few exchange-traded funds and volatility products during that time. 
* We load the `orignal_dataset` by calling the `MyMarketDataSet()` function and remove firms that do not have the maximum number of trading days. The cleaned dataset $\mathcal{D}$ is stored in the `dataset` variable, where the dataset $\mathcal{D}$ has data for $\mathcal{L}$ firms, held in the `list_of_all_firms` variable.

In [2]:
original_dataset = MyMarketDataSet() |> x-> x["dataset"];

### Clean the data
Not all tickers in our dataset have the maximum number of trading days for various reasons, e.g., acquisition or de-listing events. Let's collect only those tickers with the maximum number of trading days.

* First, let's compute the number of records for a company that we know has a maximum value, e.g., `AAPL`, and save that value in the `maximum_number_trading_days` variable:

In [3]:
maximum_number_trading_days = original_dataset["AAPL"] |> nrow;

Now, let's iterate through our data and collect only tickers with `maximum_number_trading_days` records. Save that data in the `dataset::Dict{String,DataFrame}` variable:

In [4]:
dataset = Dict{String,DataFrame}();
for (ticker,data) ∈ original_dataset
    if (nrow(data) == maximum_number_trading_days)
        dataset[ticker] = data;
    end
end
dataset;

Then, get a list of firms that we have in the cleaned-up `dataset` and save it in the `list_of_all_firms` array (we sort these alphabetically):

In [6]:
list_of_all_firms = keys(dataset) |> collect |> sort;

Finally, we set some constant values that are used throughout the study. In particular, the value of $\Delta{t}$ holds the time step that we'll use (see below for a discussion of the value), and we'll specify the number of trading days to simulate in the `T` variable:

In [7]:
Δt = (1.0/252.0);
T = 48;
all_range = range(1,stop=maximum_number_trading_days,step=1) |> collect;

## Objective 1: Are the returns in dataset $\mathcal{D}$ actually Laplace distributed?
One of the central stylized facts is that return distributions have `fat tails,` i.e., the density of the returns near zero is smaller than a normal distribution, with more density on the tails of the distribution. 
* In the example for this module, we showed that only a small fraction of returns actually followed a normal distribution. However, while we suggested an alternative `Laplace` distribution for most firms in dataset $\mathcal{D}$, we did not quantitatively test this assertion.

Let's develop a procedure based on the [Anderson–Darling test](https://en.wikipedia.org/wiki/Anderson–Darling_test) and [Kolmogorov-Smirnov test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test) exported by the [HypothesisTests.jl package](https://github.com/JuliaStats/HypothesisTests.jl) to estimate which firms in the dataset $\mathcal{D}$ follow a [Laplace distribution](https://en.wikipedia.org/wiki/Laplace_distribution)

### TODO: Estimate the return data for firms in dataset $\mathcal{D}$.
Compute the log growth rates for the firms in the `list_of_all_firms` $\mathcal{L}$ using the `log_return_matrix(...)` function.
* The `log_return_matrix(...)` takes `dataset` $\mathcal{D}$ and a list of firms $\mathcal{L}$ and computes the growth rate values for each firm as a function of time. The data is returned as a $\mathcal{D}_{i}\times\dim\mathcal{L}$ array (time on the rows, firm $i$ on the columns). We store the data in the `log_growth_array` variable:

In [8]:
log_growth_array = log_return_matrix(dataset, list_of_all_firms);

### TODO: Classify the returns of firm $i$ as $c_{i}\in\left\{\text{normal},\text{laplace},\text{undefined}\right\}$
Suppose we define the class set $\mathcal{C}\equiv\left\{\text{normal},\text{laplace},\text{undefined}\right\}$ to describe the possible types of returns. Classify the shape of the returns for each of the firms in the `dataset` $\mathcal{D}$, where for each firm $i$ we compute a classification $c_{i}\in\left\{\text{normal},\text{laplace},\text{undefined}\right\}$. For each statistical test, use a `pvalue = 0.0001` cutoff.
* `Normal`: Test for normality of the return for firm $i$ using a one-sample [Anderson–Darling test](https://en.wikipedia.org/wiki/Anderson–Darling_test).
* `Laplace`: If the return for firm $i$ is `NOT` normal, use a one-sample [Kolmogorov-Smirnov test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test) and a one-sample [Anderson–Darling test](https://en.wikipedia.org/wiki/Anderson–Darling_test) to determine if the return for firm $i$ follows a `Laplace` distribution. 
* `Undefined`: If both the tests fail, classify firm $i$ as `undefined`.

In [30]:
return_classification_dictionary = Dict{String, Symbol}();
p_value_cutoff = 0.00001;
for i ∈ eachindex(list_of_all_firms)
    
    ticker = list_of_all_firms[i];
    samples = log_growth_array[:,i];
    
    d_normal = fit_mle(Normal, samples)
    d_laplace = fit_mle(Laplace, samples)

    AD_test_result = OneSampleADTest(samples,d_normal) |> pvalue
    if (AD_test_result > p_value_cutoff) # support H0
        return_classification_dictionary[ticker] = :normal
    else
        KS_test_result_laplace = ExactOneSampleKSTest(samples, d_laplace) |> pvalue
        AD_test_result_laplace = OneSampleADTest(samples,d_laplace) |> pvalue
        if (KS_test_result_laplace > p_value_cutoff && AD_test_result_laplace > p_value_cutoff)
            return_classification_dictionary[ticker] = :laplace
        else
            return_classification_dictionary[ticker] = :undefined
        end
    end
end
return_classification_dictionary;

[33m[1m└ [22m[39m[90m@ HypothesisTests ~/.julia/packages/HypothesisTests/r322N/src/kolmogorov_smirnov.jl:68[39m
[33m[1m└ [22m[39m[90m@ HypothesisTests ~/.julia/packages/HypothesisTests/r322N/src/kolmogorov_smirnov.jl:68[39m


In [26]:
fraction_normal = findall(x->x==:normal, return_classification_dictionary) |> length |> 
    x -> x/length(list_of_all_firms)

0.08260869565217391

In [27]:
fraction_laplace = findall(x->x==:laplace, return_classification_dictionary) |> length |> 
    x -> x/length(list_of_all_firms)

0.9173913043478261

In [21]:
fraction_undefined = findall(x->x==:undefined, return_classification_dictionary) |> length |> 
    x -> x/length(list_of_all_firms)

0.0

## Objective 2: Does geometric Brownian motion replicate common stylized facts?

## Disclaimer and Risks
__This content is offered solely for training and informational purposes__. No offer or solicitation to buy or sell securities or derivative products or any investment or trading advice or strategy is made, given, or endorsed by the teaching team. 

__Trading involves risk__. Carefully review your financial situation before investing in securities, futures contracts, options, or commodity interests. Past performance, whether actual or indicated by historical tests of strategies, is no guarantee of future performance or success. Trading is generally inappropriate for someone with limited resources, investment or trading experience, or a low-risk tolerance.  Only risk capital that is not required for living expenses.

__You are fully responsible for any investment or trading decisions you make__. Such decisions should be based solely on evaluating your financial circumstances, investment or trading objectives, risk tolerance, and liquidity needs.