# Example Feature View: 2 ways to implement the same feature

In this notebook we will run through 2 ways to implement an example feature. This will be a great way to put all of your new Tecton skills to the test

Imagine a scenario where you want to build a complex feature that creates bins based on a user's sum of transaction amounts for the previous month. This feature could power a model that runs in batch on a monthly basis, on the first day of each month.

In order to build the feature, we will use 2 methods:
- Using Incremental backfills (see notebook 4) and a custom SQL query to define the aggregation and the binning logic
- Using Tecton's Aggregation to define the aggregation logic (see notebook 2) and apply an On-demand Feature view (see notebook 3) on top of it to execute the binning logic on the fly.

### 1. Setting-up connections
Note we're downgrading tecton to 0.6.1 as there is a bug for incremental backfills in version 0.7.2. Don't forget to restart the kernel after running the below command

In [None]:
%pip install tecton[snowflake]==0.6.1

In [2]:
import logging
import os
import tecton
from dotenv import load_dotenv, find_dotenv
import pandas as pd
import snowflake.connector
from datetime import datetime, timedelta
from pprint import pprint

load_dotenv(find_dotenv())  # take environment variables from .env.

logging.getLogger('snowflake.connector').setLevel(logging.WARNING)
logging.getLogger('snowflake.snowpark').setLevel(logging.WARNING)

In [3]:
#Details were sent in an email
%env SNOWFLAKE_USER=DEMO_USER
%env SNOWFLAKE_PASSWORD=tecton123!
%env SNOWFLAKE_ACCOUNT=tectonpartner-tecton_demo_usaa

env: SNOWFLAKE_USER=DEMO_USER
env: SNOWFLAKE_PASSWORD=tecton123!
env: SNOWFLAKE_ACCOUNT=tectonpartner-tecton_demo_usaa


In [39]:
connection_parameters = {
    "user": os.environ['SNOWFLAKE_USER'],
    "password": os.environ['SNOWFLAKE_PASSWORD'],
    "account": os.environ['SNOWFLAKE_ACCOUNT'],
    "warehouse": "TRIAL_WAREHOUSE",
    # Database and schema are required to create various temporary objects by tecton
    "database": "USAA_DEMO",
    "schema": "PUBLIC",
}
conn = snowflake.connector.connect(**connection_parameters)
tecton.snowflake_context.set_connection(conn) # Tecton will use this Snowflake connection for all interactive queries


# Quick helper function to query snowflake from a notebook
# Make sure to replace with the appropriate connection details for your own account
def query_snowflake(query):
    df = conn.cursor().execute(query).fetch_pandas_all()
    return df

print("dotenv location: " + find_dotenv())
tecton.version.summary()

dotenv location: 
Version: 0.6.1
Git Commit: 8cadbebe11bebac828a5103ccaf1ad792f16d50b
Build Datetime: 2023-03-15T17:57:29


### 2. First method using incremental backfills
Incremental backfills can't be tested interactively in a notebook so we will need to apply the feature view definition to our Tecton workspace and then test the feature view using .run()

create a file `users_monthly_total.py` in `feature_repo/features/batch_feature_views`

copy the following feature definition into the file:

```python
from tecton import batch_feature_view, materialization_context
from entities import user
from data_sources.transactions import transactions
from datetime import datetime, timedelta


@batch_feature_view(
    sources=[transactions],
    entities=[user],
    mode="snowflake_sql",
    online=True,
    offline=True,
    batch_schedule=timedelta(days=1),
    feature_start_time=datetime(2023, 8, 1),
    ttl=timedelta(days=30),
    incremental_backfills=True,
)
def transaction_sum_binned(transactions, context = materialization_context()):
    return f'''
    with PREVIOUS_MONTH_TOTALS as (
        select 
            USER_ID,
            TO_TIMESTAMP('{context.end_time}') - INTERVAL '1 MICROSECOND' AS TIMESTAMP,
            sum(AMT) as AMT_SUM
        from {transactions}
            where TIMESTAMP < TO_TIMESTAMP('{context.end_time}') -- since this would run on the first day of the month, context.end_time would be eg. 2023-08-01 00:00:00
            and TIMESTAMP >= last_day( TO_TIMESTAMP('{context.end_time}')-interval '2 month' ) + interval '1 day' -- this computes the first day of the previous month
        group by USER_ID, TIMESTAMP
)

select 
    USER_ID,
    TIMESTAMP,
    case 
        when AMT_SUM <100 then '0-100'
        when AMT_SUM >=100 and AMT_SUM<100 then '100-1000'
        else '1000+'
    end as LAST_MONTH_SPEND_BIN
    
    from PREVIOUS_MONTH_TOTALS LIMIT
    '''
```
Then, call `tecton plan` and `tecton apply` to push this new feature to your development workspace
Once `tecton apply` run successfully, we can retrieve the feature view from our workspace

In [46]:
ws = tecton.get_workspace('demo-vince') ## Replace with your workspace
fv = ws.get_feature_view('transaction_sum_binned')

TectonValidationError: Feature View 'transaction_sum_binned:v1' not found. Try running `workspace.list_feature_views()` to view all registered Feature Views.

**Note** Because the feature view has been applied to a dev workspace, there is not data in the offline store and we can't use get_historical_features() to produce historical feature values. For testing purposes, we can call .run() and run the feature logic for **1 materialization period**.

If you wanted to use this feature to generate training datasets or serve feature values to your production models, you would need to apply this feature view to a production workspace

In [40]:
fv.run(start_time=datetime(2023,7,31), end_time=datetime(2023,8,1)).to_pandas().head()

Unnamed: 0,USER_ID,TIMESTAMP,LAST_MONTH_SPEND_BIN
0,user_568801468984,2023-07-31 23:59:59.999999,0-100
1,user_871233292771,2023-07-31 23:59:59.999999,0-100
2,user_222506789984,2023-07-31 23:59:59.999999,0-100
3,user_722584453020,2023-07-31 23:59:59.999999,0-100
4,user_939970169861,2023-07-31 23:59:59.999999,1000+


### 3. Second method using Tecton's Aggregation framework + an On-demand Feature view
Note that compared to the previous method, this one will look back exactly 30 days prior to the day at which the materialization job runs. Which means that the aggregates are not always computed based on the first day of the previous month. Usually, this won't make a significant difference from a model performance standpoint.

Here, we build and test a Batch Feature View in the notebook

In [33]:
from tecton import batch_feature_view, Aggregation

transaction_source = ws.get_data_source('transactions')
user = ws.get_entity('fraud_user')


@batch_feature_view(
    sources=[transaction_source],
    entities=[user],
    mode="snowflake_sql",
    aggregations=[Aggregation(column='AMT', function='sum', time_window=timedelta(days=30))],
    aggregation_interval=timedelta(days=1)
)
def binned_sum(transactions):
    return f'''
    SELECT USER_ID, TIMESTAMP, AMT
    FROM {transactions}
    '''

binned_sum.validate()


BatchFeatureView 'binned_sum': Validating 1 of 3 dependencies. (2 already validated)
    Transformation 'binned_sum': Successfully validated.
BatchFeatureView 'binned_sum': Successfully validated.


In [18]:
binned_sum.get_historical_features(
start_time=datetime(2023,7,31),
end_time=datetime(2023,8,1)
).to_pandas().head()

INFO - 08/08/2023 12:30:54 AM - FeatureView - 1 Feature View is being computed directly from raw data sources.


Unnamed: 0,USER_ID,TIMESTAMP,AMT_SUM_30D_1D
0,user_722584453020,2023-07-31,2172690.08
1,user_884240387242,2023-07-31,4403373.43
2,user_504831693,2023-07-31,26977.66
3,user_950482239421,2023-07-31,1084055.5
4,user_212730160038,2023-07-31,485068.66


Our Batch Feature View only returns the sum of transaction amounts for the last 30 days, we need to apply further logic on top of the pre-computed feature to bin the feature values into different categories, we can do so using an On-Demand Feature View. In the ODFV source, we use the Batch Feature View we defined earlier to declare the dependency.

In [43]:
from tecton import on_demand_feature_view, RequestSource
from tecton.types import String, Field

@on_demand_feature_view(
    sources=[binned_sum],
    mode='python',
    schema=[Field('LAST_MONTH_SPEND_BIN',String)]
)
def binned_spend(binned_sum):
    if binned_sum['AMT_SUM_30D_1D']<100:
        return {'LAST_MONTH_SPEND_BIN':'0-100'}
    elif binned_sum['AMT_SUM_30D_1D']>=100 and binned_sum['AMT_SUM_30D_1D']<1000:
        return {'LAST_MONTH_SPEND_BIN':'100-1000'}
    else:
        return {'LAST_MONTH_SPEND_BIN':'1000+'}

binned_spend.validate()

OnDemandFeatureView 'binned_spend': Validating 1 of 2 dependencies. (1 already validated)
    Transformation 'binned_spend': Successfully validated.
OnDemandFeatureView 'binned_spend': Successfully validated.


Now we can test our new feature against a spine dataframe containing USER_IDs and transactions for August 1st 2023 using `get_historical_features()`

In [27]:
transactions_query = '''
SELECT 
    *
FROM 
    TECTON_DEMO_DATA.FRAUD_DEMO.TRANSACTIONS
WHERE TIMESTAMP <'2023-08-02' and TIMESTAMP>='2023-08-01'
ORDER BY TIMESTAMP DESC
LIMIT 50
'''
transactions = query_snowflake(transactions_query)

In [44]:
binned_spend.get_historical_features(transactions_query).to_pandas().head()

INFO - 08/09/2023 01:44:43 AM - FeatureView - 1 Feature View is being computed directly from raw data sources.
INFO - 08/09/2023 01:44:43 AM - FeatureView - 1 On-Demand Feature View is being computed ad hoc.


Unnamed: 0,USER_ID,TIMESTAMP,TRANSACTION_ID,CATEGORY,AMT,IS_FRAUD,MERCHANT,MERCH_LAT,MERCH_LONG,LAST_MONTH_SPEND_BIN
0,user_950482239421,2023-08-01 23:59:57.901819,43afe63f6909ba0ff40f38aef549f721,entertainment,69.8,0,"fraud_Parker, Nolan and Trantow",42.507625,-91.994007,1000+
1,user_884240387242,2023-08-01 23:59:56.082528,ce1a2a3bce28b811ef98c47ca9edea81,entertainment,5.24,0,"fraud_Effertz, Welch and Schowalter",42.959015,-79.653563,1000+
2,user_950482239421,2023-08-01 23:59:54.478604,21139ab8d43ce9a8e8fd90b5c68d0134,grocery_pos,111.08,0,fraud_Vandervort-Funk,42.479133,-92.85698,1000+
3,user_609904782486,2023-08-01 23:59:52.291035,c5cf400f6c750c6f275b494a1c1653c0,kids_pets,80.01,0,fraud_Ullrich Ltd,21.171019,-158.546336,1000+
4,user_724235628997,2023-08-01 23:59:49.901918,92803946af922a601bcb96e64aac6871,home,34.79,0,"fraud_Cole, Hills and Jewess",35.276528,-92.536872,1000+
