# Lab 1: Silver Fund Quant Data Module and Returns

In this lab we will:
- Explore how to pull data from the Silver Fund Quant data module.
- Demonstrate the different properties of returns.

## Setup

In order to have a smooth experience with this lab do the following:

### 1. Log into the Fulton Super Computer.

In order to log into you must have an account at [https://rc.byu.edu/](https://rc.byu.edu/) and be added to the `grp_quant` group by Brian Boyer.

It can take some time to get approved so make sure to create an account and reach out to Brian promptly.

### 2. Clone this repo to the desired location (I prefer to have a `Projects` folder where I keep all of my repositories).

Clone the repo by running
```bash
git clone https://github.com/BYUSilverFund/sf-quant-labs.git
```

### 3. Install `uv` (Package Manager)

We use `uv` to create and manage virtual environments.

To install `uv` run

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

Check that `uv` is installed by running

```bash
uv --version
```

If this returns an error you might need to add uv to your path. Run:

```bash
source $HOME/.local/bin/env
```

Restart your terminal for the changes to take effect.

### 4. Create a Virtual Environment

The virtual environment will make it so that we have consistent package and Python versions across all devices.

With `uv` it is really easy to create a virtual environment with synced dependencies.

Just run

```bash
uv sync
```

Activate the environment by running

``` bash
source .venv/bin/activate
```

## Imports

With all of the setup out of the way we will import the necessary Python packages for the lab.

- `sf_quant`: Silver Fund Quant Team package that includes modules for loading data, optimizing portfolios, backtesting, and analyzing performance.
- `datetime`: Native Python library for creating Python `date` types.
- `polars`: Data frame library similar to Pandas but with a much cleaner API and 100x speed ups.

In [None]:
import sf_quant as sf
import polars as pl
import datetime as dt

## Data

Use the following code to pull data for our investment universe from 2024-01-01 to 2024-12-31.

In [None]:
start = # TODO: create 2024-01-01 using the datetime library
end =  # TODO: create 2024-12-31 using the datetime library

columns = [
    'date',
    'barrid',
    # TODO: Add any other columns we will need for our analysis
    # NOTE: You can view all available columns by running sf.data.get_assets_columns() in another cell
]

df = sf.data.load_assets(
    start=start,
    end=end,
    in_universe=True,
    columns=columns
)

df

## Log returns

### Instructions
1. Compute the log returns for each asset.
2. Compute the cummulative log returns for each asset.
3. Run the assertion cell to make sure you're results are correct.

Make sure to sort prior to computing time series metrics and use `.over()` apply the computation in groups.

Log returns have the nice property of being additive. Use this to your advantage!

In [None]:
def task_compute_log_returns(df: pl.DataFrame) -> pl.DataFrame:
    """
    Compute the log returns for each security and date combo.

    Args:
        df (pl.DataFrame): Data frame containing columns date, barrid, and return

    Returns:
        pl.DataFrame: Data frame containing columns date, barrid, return, and log_return
    """

    # TODO: Finish this function

    pass

df_log = task_compute_log_returns(df)

df_log

In [None]:
def task_compute_cumulative_log_returns(df_log: pl.DataFrame) -> pl.DataFrame:
    """
    Compute the cummulative log returns for each security and date combo.

    Args:
        df_log (pl.DataFrame): Data frame containing columns date, barrid, return, and log_return

    Returns:
        pl.DataFrame: Data frame containing columns date, barrid, return, log_return, and cumulative_log_return
    """

    # TODO: Finish this function

    pass

df_cum_log = task_compute_cumulative_log_returns(df_log)

df_cum_log

In [None]:
assert df_cum_log['cumulative_log_return'].max() == 2.8475532093020557

## Compounded Returns

### Instructions

1. Compute the cumulative compounded returns for each asset.
2. Run the assertion to check that your results are correct.

In [None]:
def task_compute_cumulative_compounded_returns(df_cum_log: pl.DataFrame) -> pl.DataFrame:
    """
    Compute the cumulative compounded returns for each security.

    Args:
        df_cum_log (pl.DataFrame): Data frame containing columns date, barrid, return, log_return, and cumulative_log_return

    Returns:
        pl.DataFrame: Data frame containing columns date, barrid, return, log_return, cumulative_log_return, and cumulative_compouned_return
    """

    # TODO: Finish this function

    pass

df_cum_comp = task_compute_cumulative_compounded_returns(df_cum_log)

df_cum_comp

In [None]:
assert df_cum_comp['cumulative_compounded_return'].max() == 16.245533963705515

## Exponentiation

Note that the max cumulative log return is different from the cumulative compounded return.

Why is that?

The answer is that the cumulative log return is still in log space!

### Instructions

1. Exponentiate the cumulative log returns to put them back into the original space.
2. Check that the exponentiated returns match the cumulative compounded returns.

In [None]:
def task_exponentiate_returns(df_cum_comp: pl.DataFrame) -> pl.DataFrame:
    """
    Exponentiate the cumulative log returns.

    Args:
        df_cum_comp: Data frame containing date, barrid, return, log_return, cumulative_log_return, and cumulative_compouned_return.

    Returns:
        pl.DataFrame: Data frame containing all previous columns plus exponentiated_returns
    """

    # TODO: Finish this function

    pass

df_exp = task_exponentiate_returns(df_cum_comp)

df_exp

In [None]:
assert df_exp['cumulative_compounded_return'].max() == df_exp['exponentiated_return'].max()