<a href="https://colab.research.google.com/github/tecton-ai/demo-notebooks/blob/main/Tecton_Building_On_Demand_Features.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ⏱️ Building On-Demand Features

---

##### 💡 **NOT YET A TECTON USER?**

Sign-up at [explore.tecton.ai](https://explore.tecton.ai) for a free account that lets you try out this tutorial and explore Tecton's Web UI.

---

Many critical features for real-time models can only be calculated at the time of a request, either because:

1.   They require data that is only available at request time (e.g. a user's current location)
2.   They can't efficiently be pre-computed (e.g. computing the embedding similarity between all possible users)

Running transformations at request time can also be useful for:


1.   Post-processing feature data (example: imputing null values)
2.   Running additional transformations after Tecton-managed aggregations
3.   Defining new features without needing to rematerialize Feature Store data

For more details, see [On-Demand Feature Views](https://docs.tecton.ai/docs/defining-features/feature-views/on-demand-feature-view).


This is where "On-Demand" features come in. In Tecton, an On-Demand Feature View let's you calculate features at the time of a request, using either data passed in with the request or pre-computed batch and stream features.

This tutorial will show how you can develop, test, and productionize on-demand features for real-time models. This tutorial is centered around a fraud detection use case, where we need to predict in real-time whether a transaction that a user is making is fraudulent.

---

##### 🗒️ **NOTE**

This tutorial assumes some basic familiarity with Tecton. If you are new to Tecton, we recommend first checking out the [Building a Production AI Application with Tecton](https://docs.tecton.ai/docs/tutorials/building-a-production-ai-application) which walks through an end-to-end journey of building a real-time ML application with Tecton.


---

## ⚙️ Install Pre-Reqs

First things first, let's install the Tecton SDK and other libraries used in this tutorial by running the cell below.

In [None]:
!pip install 'tecton[rift]==1.0.0' gcsfs s3fs -q

## ✅ Log in to Tecton




Next we will authenticate with your organization's Tecton account and import libraries we will need.

For users that just signed up via `explore.tecton.ai` you can leave this step as
is. If your organization has its own Tecton account, replace
`explore.tecton.ai` with your account url.

*Note: You need to press `enter` after pasting in your authentication code.*

In [None]:
import tecton

tecton.login("explore.tecton.ai")  # replace with your URL

Let's then run some basic imports and setup that we will use later in the tutorial.

In [5]:
from tecton import *
from tecton.types import *
from datetime import datetime, timedelta
import pandas as pd

tecton.conf.set("TECTON_OFFLINE_RETRIEVAL_COMPUTE_MODE", "rift")

## 👩‍💻 Create an on-demand feature that leverages request data

Let's say that for our fraud detection model, we want to be able to leverage information about the user's current transaction that we are evaluating. We only have access to that information at the time of evaluation so any features derived from current transaction information need to be computed in real-time.

On-Demand Feature Views are able to leverage real-time request data for building features. In this case, we will do a very simple check to see if the current transaction amount is over $1000. This is a pretty basic feature, but in the next section we will look at how to make it better!

To define an on-demand feature that leverages request data, we first define a Request Source. The Request Source specifies the expected schema for the data that will be passed in with the request.

##### ℹ️ **INFO**
When using mode='python' the inputs and outputs of the On-Demand Feature View are dictionaries.

For more information on modes in On Demand Feature Views see [On-Demand Feature Views](https://docs.tecton.ai/docs/defining-features/feature-views/on-demand-feature-view#how-to-choose-between-pandas-and-python-mode).

---

In [6]:
transaction_request = RequestSource(schema=[Field("amount", Float64)])


@on_demand_feature_view(
    sources=[transaction_request],
    mode="python",
    schema=[Field("transaction_amount_is_high", Bool)],
)
def transaction_amount_is_high(transaction_request):
    return {"transaction_amount_is_high": transaction_request["amount"] > 1000}

Now that we've defined our feature, we can test it out with some mock data using
`.run_transformation()`.


In [None]:
input_data = {"transaction_request": {"amount": 182.4}}

transaction_amount_is_high.run_transformation(input_data=input_data)

This feature is okay, but wouldn't it be much better if we could compare the transaction amount to the user's historical average?

## 🔗 Create an on-demand feature that leverages request data and other features

On-Demand Feature Views also have the ability to depend on Batch and Stream Feature Views as input data sources. We can use this capability to improve our feature. Let's take a look.

First we will create a Batch Feature View that computes the user's 1-year average transaction amount. Then we will add this as a source in a new On-Demand Feature View with an updated feature transformation.

In [9]:
transactions_batch = BatchSource(
    name="transactions_batch",
    batch_config=FileConfig(
        uri="s3://tecton.ai.public/tutorials/transactions.pq",
        file_format="parquet",
        timestamp_field="timestamp",
    ),
)

user = Entity(name="user", join_keys=["user_id"])


@batch_feature_view(
    sources=[transactions_batch],
    entities=[user],
    mode="pandas",
    aggregation_interval=timedelta(days=1),
    aggregations=[
        Aggregation(function="mean", column="amount", time_window=timedelta(days=365), name="yearly_average"),
    ],
    schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amount", Float64)],
)
def user_transaction_averages(transactions):
    return transactions[["user_id", "timestamp", "amount"]]


transaction_request = RequestSource(schema=[Field("amount", Float64)])


@on_demand_feature_view(
    sources=[transaction_request, user_transaction_averages],
    mode="python",
    schema=[Field("transaction_amount_is_higher_than_average", Bool)],
)
def transaction_amount_is_higher_than_average(transaction_request, user_transaction_averages):
    amount_mean = user_transaction_averages["yearly_average"] or 0
    return {"transaction_amount_is_higher_than_average": transaction_request["amount"] > amount_mean}

We can again test our new feature using `.run_transformation()` and passing in
example data.

In [None]:
input_data = {"transaction_request": {"amount": 182.4}, "user_transaction_averages": {"yearly_average": 33.46}}

transaction_amount_is_higher_than_average.run_transformation(input_data=input_data)

Great! Now that this feature looks to be doing what we want, let's see how we can generate training data with it.

## 🧮 Generating Training Data with On-Demand Features

When generating training datasets for on-demand features, Tecton uses the exact same transformation logic as it does online to eliminate online/offline skew.

The Python function you defined will be executed as a UDF on the training data set.

To see this in action, we will first load up a set of historical training events.

##### ℹ️ **INFO**

Tecton expects that any request data passed in online is present in the set of historical training events. In our example below, this is represented by the amount column.

---

In [None]:
# Retrieve our dataset of historical transaction data
transactions_df = pd.read_parquet("s3://tecton.ai.public/tutorials/transactions.pq", storage_options={"anon": True})

# Retrieve our dataset of labels containing transaction_id and is_fraud (set to 1 if the transaction is fraudulent or 0 otherwise)
training_labels = pd.read_parquet("s3://tecton.ai.public/tutorials/labels.pq", storage_options={"anon": True})

# Join our label dataset to our transaction data to produce a list of training events
training_events = training_labels.merge(transactions_df, on=["transaction_id"], how="left")[
    ["user_id", "timestamp", "amount", "is_fraud"]
]

display(training_events.head(5))

Now we can add our On-Demand Feature View to a Feature Service and generate training data for these historical events.

##### 🗒️ **NOTE**

We included the dependent Batch Feature View in the Feature Service as well to visualize the data better, but it is not necessary to include.

---

In [None]:
from tecton import FeatureService


fraud_detection_feature_service = FeatureService(
    name="fraud_detection_feature_service",
    features=[user_transaction_averages, transaction_amount_is_higher_than_average],
)

training_data = fraud_detection_feature_service.get_features_for_events(training_events).to_pandas().fillna(0)
display(training_data.head(5))

We can use this training data set to train an accurate model with our new feature.

Once we are happy with our On-Demand Feature View we can copy the definitions into our Feature Repository and apply our changes to a live workspace using the Tecton CLI.

Follow the [instructions](https://docs.tecton.ai/docs/tutorials/building-on-demand-features#-run-on-demand-features-in-production)
to run on-demand features in production.