# On-Demand Feature Views (ODFVs) Tutorial

Tecton has 3 basic types of Feature Views in Tecton:
- [Batch Feature View](https://docs.tecton.ai/docs/defining-features/feature-views/batch-feature-view)
- [Stream Feature View](https://docs.tecton.ai/docs/defining-features/feature-views/stream-feature-view)
- [On-Demand Feature View](https://docs.tecton.ai/docs/defining-features/feature-views/on-demand-feature-view)

In this tutorial we'll focus on **On-Demand Feature Views**.

In [5]:
!pip install snowflake-snowpark-python[pandas]

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting pyarrow<10.1.0,>=10.0.1 (from snowflake-connector-python<4.0.0,>=3.0.4->snowflake-snowpark-python[pandas])
  Downloading pyarrow-10.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (35.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m35.9/35.9 MB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: pyarrow
  Attempting uninstall: pyarrow
    Found existing installation: pyarrow 8.0.0
    Uninstalling pyarrow-8.0.0:
      Successfully uninstalled pyarrow-8.0.0
Successfully installed pyarrow-10.0.1


## What is an On-Demand Feature?

Most of the features that you'll build in Tecton are **precomputed** -- this means that Tecton will run the data pipelines needed to compute these features before they are needed, and your ML applications will simply look up precomputed feature values from Tecton.

In some scenarios, the model of precomputing features doesn't make sense, and instead you'd rather compute the value of a feature **on-demand**.  Some examples:
* You need access to data that is only available just before you need to make a prediction
  * (example) a user is making a transaction, and you want to compute features about the transaction
  * (example) a user just filled out a form in your application, and you want to featurize the data they entered
* Precomputing features is inefficient because most of the features will never be used
  * (example) you want to calculate two users mutual friends, but precomputing mutual friends for every user is infeasible

For these scenarios, Tecton has support for **On-Demand Features** -- features that are dynamically computed when requesting features for inference.  Also note that inputs for On-Demand Feature Views can be provided on the request to Tecton for feature data, as well as data retrieved from the feature store.

## How do they work?

### Writing On-Demand Features / Modes Available
On-Demand Features are written in declaritive code just like all other features in Tecton.  They are written in python or pandas code depending on the code specified in the decorator.

### At Inference Time
At inference time, the transformation logic for on-demand feature are run directly on the Tecton-managed serving infrastructure. Tecton has developed an efficient method to quickly invoke python functions at serving time without inducing significant overhead. How this works:

1. When you invoke [Tecton's Feature Serving API](https://docs.tecton.ai/v2/examples/fetch-real-time-features.html), you'll include any request-time data that needs to be processed in one-or-more on-demand features.
2. While Tecton is looking up any precomputed features, Tecton will also invoke your on-demand transformation logic to compute the on-demand feature on the fly.
3. Tecton will return a feature vector that includes both the precomputed and on-demand features that you requested from the API

### At Training Time
At training time, Tecton makes it easy to run the exact same transformation logic against your historical data.  Specifically, Tecton will turn your python transformation into a Python UDF that can efficiently run your transformation logic against large datasets.

#### Speed
Note that this is your own code and its efficiency can affect serving latency.  Also note there are two supported modes; `python` and `pandas` - the former is quickest for real-time serving.

## Tutorial: Building an On-Demand Feature

In this tutorial, we'll walk through a few examples of usage patterns for On-Demand Feature Views.
We will build 2 ODFVs:
* Credit score binning + sum of outgoing transactions based on request-time data
* Number of days between the user's last transaction and the current transaction

### 1. ODFV 1: Processing a JSON Payload on the request

In this example, we will featurize some data that a client has passed to Tecton in real-time.  The client has reached out to a third party API (e.g Plaid) and received a credit score and a series of past transactions.  This data will be provided in json format. 

Sample data:
<pre>
{
  "TRANSACTIONS": [
    {"USER_ID": "miket", "AMT": 141.55, "TIMESTAMP": "2023-01-10 11:05:21"},
    {"USER_ID": "miket", "AMT": -2000.00, "TIMESTAMP": "2023-01-10 13:43:09"},
    {"USER_ID": "miket", "AMT": 317.95, "TIMESTAMP": "2023-01-10 12:27:57"},
    {"USER_ID": "miket", "AMT": -500.00, "TIMESTAMP": "2023-01-10 19:19:32"},
    {"USER_ID": "miket", "AMT": 411.19, "TIMESTAMP": "2023-01-10 21:51:46"}
  ],
  "CREDIT_SCORE": 743
}
</pre>

We will create two features from this data.

1. A binary `credit_score_is_high`: 1 if the score is above 730, 0 if it is not.
2. An aggregation `sum_of_outflows`: the sum of all the transactions below 0.

### Declaring Request Input and ODFV Output Schemas

This feature view is going to need one input

1. The json payload coming from the 3rd party API

We will expect the payload data to be provided in the Tecton API call as a string, we will define a `RequestSource` object that will be used as a data source for our ODFV. The `RequestSource` specifies the expected schema of the ODFV real-time inputs. 

We also need to declare the schema of our output feature.  In this case, our `credit_score_is_high` is of type `Int64` while our `sum_of_outflows` feature is of type `Float64`

Below, we'll use Tecton types to declare what the input request schema provides and what the output schema looks like.

In [1]:
from tecton import RequestSource
from tecton.types import Float64, Int64, Field, String

request_schema = [Field('payload', String)]
transaction_request = RequestSource(schema=request_schema)

output_schema = [
  Field('credit_score_is_high', Int64),
  Field('sum_of_outflows', Float64)
]

### Defining the ODFV function


Now we can define, validate and test our On-Demand Feature View locally in this notebook against mock inputs. For this Feature View, the mode is set to `python` which means that the input and output objects will be dictionaries

In [2]:
from tecton import on_demand_feature_view

@on_demand_feature_view(
  sources=[transaction_request],
  mode='python',
  schema=output_schema,
)
def odfv_payload_features(transaction_request):
    import json
    import pandas

    response_parsed = json.loads(transaction_request['payload'])

    credit_score_is_high = 0
    if 'CREDIT_SCORE' in response_parsed:
        if response_parsed['CREDIT_SCORE'] > 730:
            credit_score_is_high = 1
  
    sum_of_outflows = 0
    if 'TRANSACTIONS' in response_parsed:
        df = pandas.json_normalize(response_parsed['TRANSACTIONS'])
        series_outflow_amounts = df[df['AMT'] < 0]['AMT']

        if len(series_outflow_amounts) > 0:
            sum_of_outflows = sum(series_outflow_amounts)

    return {'credit_score_is_high': credit_score_is_high, 
    'sum_of_outflows': sum_of_outflows}

odfv_payload_features.validate()

OnDemandFeatureView 'odfv_payload_features': Validating 1 dependency.
    Transformation 'odfv_payload_features': Successfully validated.
OnDemandFeatureView 'odfv_payload_features': Successfully validated.


### Testing the ODFV against mock inputs
There are now several ways we can test this ODFV. One is by providing mock inputs and calling the `run` function.

In [3]:
import json
request_dict = \
{
  "TRANSACTIONS": [
    {"USER_ID": "john", "AMT": -100.00, "TIMESTAMP": "2023-01-10 11:05:21"},
    {"USER_ID": "john", "AMT": -300.00, "TIMESTAMP": "2023-01-10 13:43:09"},
    {"USER_ID": "john", "AMT": 23.97, "TIMESTAMP": "2023-01-10 12:27:57"}
  ],
  "CREDIT_SCORE": 691
}

request_payload = json.dumps(request_dict)

In [4]:
odfv_payload_features.run(transaction_request={'payload': request_payload})

{'credit_score_is_high': 0, 'sum_of_outflows': -400.0}

### 2. ODFV 2: Number of days between the user's last transaction and the current transaction

On-Demand Feature Views can depend on pre-computed (Batch or Streaming) features stored in the Offline and Online store. This enables to support scenario where there's a need for combining real-time data and Batch/Streaming features. For example, comparing whether the current transaction amount is above a user's last 30 days transaction average. 

In our example, we will compute the number of days between a user's last transaction and the current transaction being processed. In order to compute this feature, we will need to read data in from 2 data sources:
* The last transaction date prior to the current one can be pulled from a Batch Feature View or in some cases a Streaming feature view. In this tutorial, will create a Batch Feature View to compute this feature.
* The current transaction date will come from real-time data, we will create a corresponding RequestSource.

*❓Can we just use `current_timestamp()` for the request?*  **No**, we cannot - because this would not work when we are doing point-in-time-correct historic generation of datasets.  We want those operations to use the timestamp of the request in the past when it was made, so this is the value we must pass in.  Using something like `current_timestamp()` would break the proper math when doing historical time travel.

#### 2.1) Creating the last transaction date Batch Feature View

In [1]:
#Details were sent in an email
%env SNOWFLAKE_USER=DEMO_USER
%env SNOWFLAKE_PASSWORD=tecton123!
%env SNOWFLAKE_ACCOUNT=tectonpartner-tecton_demo_usaa

# Import Tecton and other libraries
import logging
import os
import tecton
from dotenv import load_dotenv, find_dotenv
import pandas as pd
import snowflake.connector
from datetime import datetime, timedelta
from pprint import pprint

load_dotenv()  # take environment variables from .env.
logging.getLogger('snowflake.connector').setLevel(logging.WARNING)
logging.getLogger('snowflake.snowpark').setLevel(logging.WARNING)

connection_parameters = {
    "user": os.environ['SNOWFLAKE_USER'],
    "password": os.environ['SNOWFLAKE_PASSWORD'],
    "account": os.environ['SNOWFLAKE_ACCOUNT'],
    "warehouse": "TRIAL_WAREHOUSE",
    # Database and schema are required to create various temporary objects by tecton
    "database": "USAA_DEMO",
    "schema": "PUBLIC",
}

conn = snowflake.connector.connect(**connection_parameters)
tecton.snowflake_context.set_connection(conn) # Tecton will use this Snowflake connection for all interactive queries


# Quick helper function to query snowflake from a notebook
# Make sure to replace with the appropriate connection details for your own account
def query_snowflake(query):
    df = conn.cursor().execute(query).fetch_pandas_all()
    return df

tecton.version.summary()

env: SNOWFLAKE_USER=DEMO_USER
env: SNOWFLAKE_PASSWORD=tecton123!
env: SNOWFLAKE_ACCOUNT=tectonpartner-tecton_demo_usaa
Version: 0.7.2
Git Commit: b1f0847f4a680bb6307d53150af3589d3fad57e0
Build Datetime: 2023-07-28T21:12:30


In [7]:
ws = tecton.get_workspace('prod')
user = ws.get_entity('fraud_user')
transactions = ws.get_data_source('transactions')

In [8]:
from tecton import batch_feature_view 

# make a BFV for transactions
@batch_feature_view(
  sources=[transactions],
  entities=[user],
  mode='snowflake_sql',
  batch_schedule=timedelta(days=1),
  feature_start_time=datetime(2023, 1, 1),
  timestamp_field='TIMESTAMP',
  ttl=timedelta(days=365)
)
def last_transaction(transactions_batch):
    return f'''
    SELECT USER_ID, 
    AMT as LAST_TRANSACTION_AMOUNT,
    cast(TIMESTAMP as string) as LAST_TRANSACTION_TIMESTAMP,  --we need to alias timestamp and make it a string to make it a feature
    TIMESTAMP
    FROM {transactions_batch}
  '''

last_transaction.validate()

BatchFeatureView 'last_transaction': Validating 1 of 3 dependencies. (2 already validated)
    Transformation 'last_transaction': Successfully validated.
BatchFeatureView 'last_transaction': Successfully validated.


In [9]:
last_transaction.get_historical_features(
    start_time=datetime(2023, 1, 1), 
    end_time=datetime(2023, 6, 1)).to_pandas().head()

Unnamed: 0,USER_ID,LAST_TRANSACTION_AMOUNT,LAST_TRANSACTION_TIMESTAMP,TIMESTAMP
0,user_402539845901,9.71,2023-01-01 02:35:31.588,2023-01-01 02:35:31.588860
1,user_222506789984,5.17,2023-01-01 02:35:34.274,2023-01-01 02:35:34.274670
2,user_502567604689,53.97,2023-01-01 02:35:36.896,2023-01-01 02:35:36.896390
3,user_538895124917,516.54,2023-01-01 02:35:38.574,2023-01-01 02:35:38.574367
4,user_268514844966,227.8,2023-01-01 02:35:41.087,2023-01-01 02:35:41.087538


#### 2.2) Creating the On-Demand Feature View

💡 Notice how the last_transaction BFV we defined earlier is now used as a source to the ODFV. Tecton will automatically look-up the right feature value based on the entity key provided in the request to Tecton

In [9]:
request_schema = [Field('REQUEST_TIMESTAMP', String)]
request = RequestSource(schema=request_schema)
output_schema = [Field('DAYS_SINCE_LAST_TRANSACTION', Int64)]

@on_demand_feature_view(
    sources=[request, last_transaction],
    mode='python',
    schema=output_schema
)
def odfv_days_since_last_txn(request, last_transaction):
    from datetime import datetime, date
  
  # if we have a value from the feature store, convert the retrieved value and request date strings to dates and return the number of days between them
    if last_transaction['LAST_TRANSACTION_TIMESTAMP']:
        request_datetime = datetime.strptime(request['REQUEST_TIMESTAMP'], '%Y-%m-%d %H:%M:%S.%f')
        transaction_datetime = datetime.strptime(last_transaction['LAST_TRANSACTION_TIMESTAMP'], '%Y-%m-%d %H:%M:%S.%f')
        td = request_datetime - transaction_datetime
        return {'DAYS_SINCE_LAST_TRANSACTION': td.days}
  
  # else return -1 indicating we haven't had a prior transaction
    else:
        return {'DAYS_SINCE_LAST_TRANSACTION': -1}
    
odfv_days_since_last_txn.validate()

OnDemandFeatureView 'odfv_days_since_last_txn': Validating 1 of 2 dependencies. (1 already validated)
    Transformation 'odfv_days_since_last_txn': Successfully validated.
OnDemandFeatureView 'odfv_days_since_last_txn': Successfully validated.


✅ We can test this On-demand Feature View with mock inputs, refer to our documentation for more details on [interactive testing of ODFVs with dependencies](https://docs.tecton.ai/docs/testing-features/interactive-testing/testing-on-demand-features#on-demand-feature-views-with-feature-view-dependencies) 

When testing this ODFV, we have to provide mock inputs for all the ODFV inputs, including the `last_transaction` BFV

In [10]:
odfv_days_since_last_txn.run(
    request={'REQUEST_TIMESTAMP': '2023-05-28 00:00:00.000'}, 
    last_transaction={'LAST_TRANSACTION_TIMESTAMP': '2023-05-17 00:00:00.000'})

{'DAYS_SINCE_LAST_TRANSACTION': 11}

We can also test the ODFV against a spine of training events using `get_historical_features()`

### 3) Publishing the ODFVs to Tecton 
✅  To add these features to Tecton, simply add it to a new file in your Tecton Feature Repository and run `tecton plan` and `tecton apply`.
We can now retrieve these feature views in our notebook and generate training data from a spine!

In [2]:
transactions_query = '''
SELECT 
    MERCHANT,
    USER_ID,
    CATEGORY,
    TIMESTAMP,
    AMT,
    cast(TIMESTAMP as string) as REQUEST_TIMESTAMP,
    IS_FRAUD
FROM 
    TECTON_DEMO_DATA.FRAUD_DEMO.TRANSACTIONS 
ORDER BY TIMESTAMP DESC
LIMIT 10
'''
transactions = query_snowflake(transactions_query)
transactions.head(5)

Unnamed: 0,MERCHANT,USER_ID,CATEGORY,TIMESTAMP,AMT,REQUEST_TIMESTAMP,IS_FRAUD
0,"fraud_Haley, Jewess and Bechtelar",user_459842889956,shopping_pos,2023-08-03 20:55:36.156244,1.92,2023-08-03 20:55:36.156,0
1,fraud_Hamill-D'Amore,user_650387977076,health_fitness,2023-08-03 20:55:33.612956,53.08,2023-08-03 20:55:33.612,0
2,fraud_Durgan-Auer,user_394495759023,misc_net,2023-08-03 20:55:30.557317,5.57,2023-08-03 20:55:30.557,0
3,"fraud_McCullough, Hudson and Schuster",user_461615966685,food_dining,2023-08-03 20:55:27.434522,35.37,2023-08-03 20:55:27.434,0
4,fraud_Bernhard Inc,user_884240387242,gas_transport,2023-08-03 20:55:25.250398,78.23,2023-08-03 20:55:25.250,0


In [6]:
fv = tecton.get_workspace('demo-vince').get_feature_view('transaction_amount_is_high')

fv.get_historical_features(transactions_query).to_pandas().head(10)

Unnamed: 0,MERCHANT,USER_ID,CATEGORY,TIMESTAMP,AMT,REQUEST_TIMESTAMP,IS_FRAUD,TRANSACTION_AMOUNT_IS_HIGH
0,"fraud_Haley, Jewess and Bechtelar",user_459842889956,shopping_pos,2023-08-03 20:55:36.156244,1.92,2023-08-03 20:55:36.156,0,
1,fraud_Hamill-D'Amore,user_650387977076,health_fitness,2023-08-03 20:55:33.612956,53.08,2023-08-03 20:55:33.612,0,
2,fraud_Durgan-Auer,user_394495759023,misc_net,2023-08-03 20:55:30.557317,5.57,2023-08-03 20:55:30.557,0,
3,"fraud_McCullough, Hudson and Schuster",user_461615966685,food_dining,2023-08-03 20:55:27.434522,35.37,2023-08-03 20:55:27.434,0,
4,fraud_Bernhard Inc,user_884240387242,gas_transport,2023-08-03 20:55:25.250398,78.23,2023-08-03 20:55:25.250,0,
5,fraud_Kub PLC,user_912293302206,personal_care,2023-08-03 20:55:20.858521,11.68,2023-08-03 20:55:20.858,0,
6,fraud_Rempel Inc,user_650387977076,shopping_net,2023-08-03 20:55:18.432924,3.29,2023-08-03 20:55:18.432,0,
7,fraud_Hackett-Lueilwitz,user_650387977076,grocery_pos,2023-08-03 20:55:14.864775,22.72,2023-08-03 20:55:14.864,0,
8,fraud_Hintz-Bruen,user_650387977076,grocery_net,2023-08-03 20:55:12.984877,31.39,2023-08-03 20:55:12.984,0,
9,fraud_Jewess LLC,user_574612776685,shopping_pos,2023-08-03 20:55:10.962614,1.52,2023-08-03 20:55:10.962,0,
