# Gradient Feature

Example of how a feature is built. This notebook goes through each step in the `build_features_by_dt` function (and its components) in `releat/data/pipeline.py` script

## PREREQUISITE - Download tick data

Before running this notebook, download tick data from brokers by running the following command from within the docker container: 

`/.venv/bin/python /workspaces/releat/workflows/download_mt5_data.py`

Alternatively, you can run it from your local terminal and execute on your docker container, replace `<container-name>` with the name of the container, which should either be `releat` or `releat-dc` depending on how you set it up:

`docker exec -it <container-name> /.venv/bin/python /workspaces/releat/workflows/download_mt5_data.py`

In [None]:
from releat.utils.logging import get_logger
from releat.utils.configs.config_builder import load_config
from releat.data.pipeline import load_raw_tick_data
from releat.data.cleaning import group_tick_data_by_time
from releat.data.simple.stats import calc_gradient_feature
from releat.data.cleaning import fill_trade_interval
from releat.data.transformers import get_transform_params
import logging
import polars as pl
import pandas as pd
logger = get_logger(__name__, log_level=logging.INFO)

## Load feature config and data

- For this example, see /agents/t00001/feature_config.py
- The load_config function validates configs via pydantic as well as combines all the other config files in the /agents/t00001 folder

In [None]:
config = load_config('t00001')

In [None]:
# Index of the feature group - in this case we want the 5m timeframe
feat_group_ind = 1

# Index of the feature within the feature group
feat_ind = 3

feat_group = config.features[feat_group_ind]
fc = feat_group.simple_features[feat_ind]

# the simple config that defines a single feature (conversion to dict is for printing only)
dict(fc)

In [None]:
# load tick data
dt = '2023-06-01'
symbol = fc.symbol
broker = fc.broker
tick_df = load_raw_tick_data(config, broker, symbol, dt)

# For this example, reduce sample size so that it runs quickly
tick_df = tick_df.head(100_000)

# Note that this is a polars dataframe
tick_df.head(10)

In [None]:
df_group = group_tick_data_by_time(config, feat_group_ind, tick_df)

# Print some summary statistics
summary = df_group.agg(
    [
        pl.col("time_msc").min().alias("min_datetime"),
        pl.col("time_msc").max().alias("max_datetime"),
        pl.col("time_msc").count().alias("num_ticks"),
        pl.col("avg_price").last().alias("price")
    ]
)
summary.head(10)

For this group by, note that:

- the column `time_msc` will be used as the index for building the feature
- the column `time_msc` increments in 10s, which is defined as the `trade_timeframe` parameter in `agents/t00001/agent_config.py`
- the `min_datetime` and `max_datetime` look forward, i.e. for the timestamp `2023-05-31 00:01:20`, the maximum datetime in that group is `2023-05-31 00:06:14.865`. The time shift so that the feature done after the feature is build, i.e. later the timestamp for this group will be converted to `2023-05-31 00:06:20`

## Make Feature

This is mostly taken from the `make_feature` function of `releat/data/pipeline.py`

In [None]:
feature_timeframe = fc.timeframe
trade_timeframe = config.raw_data.trade_timeframe
pip = config.symbol_info[config.symbol_info_index[fc.symbol]].pip

# make the gradient feature
df = calc_gradient_feature(df_group, fc, pip)

df.head(10)

As noted above, then clean the `time_msc` column by making sure its the correct type and adding a time offset. The timestamp label for each feature should refer to the right boundary, i.e. the gradient feature for `2023-05-31 00:06:20` refers to tick data that happens between `2023-05-31 00:01:20` (inclusive) and `2023-05-31 00:06:20` (excluding this timestamp)

We also shift the feature by a trade time offset, which represents the lag or number of seconds that the agent makes a trade after the information is available. For this example, this lag is set to 3s.

In [None]:
df = df.with_columns(pl.col("time_msc").dt.cast_time_unit("ns"))
df = df.with_columns(pl.col("time_msc").dt.offset_by(feature_timeframe)).with_columns(
        pl.col("time_msc").dt.offset_by(config.raw_data.trade_time_offset),
    )
df.head(10)    

In [None]:
# Fill any NAs in the dataset according to the feature config, also fill in any missing
# timeframes
print(f"feature set length before fill: {len(df)}")
df = fill_trade_interval(df, trade_timeframe, fc.fillna)
print(f"feature set length after fill: {len(df)}")

## Scale and Transform Feature

Note that this overwrites any existing scaling parameters in `data/agent/t00001/features/1_5m/3_grad/transforms`

In [None]:
# The transforms are specified per feature config
fc.transforms

In [None]:
cols = [x for x in df.columns if x != "time_msc"]
feats = df.select(cols).to_numpy()

feats_t = get_transform_params(config, feat_group_ind, feat_ind, feats)
feats_t

## Visualize Feature

For the purposes of visualising, the datasets are roughly joined together. i.e. the filled in dataset has more records than the initial summary, but in this example the differences are small and is ignored

In [None]:
feats_t = feats_t[:len(summary),0]
summary = summary.to_pandas()
summary["feature"] = feats_t
summary.head(10)

In [None]:
summary.set_index("time_msc",inplace=True)
summary = summary[["price","feature"]]
summary.iloc[800:1000].plot(secondary_y='price', figsize=(8, 5))