# 2. Building Features with Tecton and Snowflake

In this tutorial we'll cover how you can use Tecton and Snowflake to build features for machine learning.  We'll cover:
* How features are written in Tecton
* How to use Notebook Driven Development (NDD) to declare and test out a new feature in local code
* How to register features with Tecton
* How to use Tecton Aggregations to do easy window aggregations

## 0. Setup

✅ Run the cells below.

In [3]:
import logging
import os
import tecton
from dotenv import load_dotenv
import pandas as pd
import snowflake.connector
from datetime import date, datetime, timedelta
from pprint import pprint

In [4]:
load_dotenv()  # take environment variables from .env.
logging.getLogger('snowflake.connector').setLevel(logging.WARNING)
logging.getLogger('snowflake.snowpark').setLevel(logging.WARNING)

connection_parameters = {
    "user": os.environ['SNOWFLAKE_USER'],
    "password": os.environ['SNOWFLAKE_PASSWORD'],
    "account": os.environ['SNOWFLAKE_ACCOUNT'],
    "warehouse": "TRIAL_WAREHOUSE",
    # Database and schema are required to create various temporary objects by tecton
    "database": "TECTON_DEMO_DATA",
    "schema": "FRAUD_DEMO",
}
conn = snowflake.connector.connect(**connection_parameters)
tecton.snowflake_context.set_connection(conn) # Tecton will use this Snowflake connection for all interactive queries

# Quick helper function to query snowflake from a notebook
# Make sure to replace with the appropriate connection details for your own account
def query_snowflake(query):
    df = conn.cursor().execute(query).fetch_pandas_all()
    return df

ws = tecton.get_workspace('prod')
tecton.version.summary()

Version: 0.7.0b29
Git Commit: 4421324c8d9880367529bc978d7fa27b044b6fa7
Build Datetime: 2023-05-16T23:03:03


### ❓ Before we start -- Tecton Workspaces

[Workspaces](https://docs.tecton.ai/overviews/workspaces.html) are like a sandbox environment that can be used for experimenting with a Feature Repo without affecting the production environment. Changes made in one workspace will have no affect on other Workspaces.

By default, new "development" workspaces do not have access to materialization and storage resources. Instead, transformations can be run ad-hoc in your Snowflake Warehouse. This means that the Tecton SDK builds a query that reads directly from your raw data tables, and executes it in your Snowflake Warehouse.

This ad-hoc computation functionality can be used in any workspace and allows you to easily test features without needing to backfill and materialize data to the Feature Store.

New workspaces with full materialization and storage resources can be created with the addition of the _--live_ flag during create time in the below CLI command. This can be useful for creating staging environments for testing features online before pushing changes to prod, or for creating isolation between different teams.

**In this tutorial, we'll create a new workspace to ensure our changes don't effect other's workloads**

### ✅ Create your own Tecton Workspace
In this tutorial, we'll create a new [Workspace](https://docs.tecton.ai/docs/setting-up-tecton/administration-setup/creating-a-workspace-and-adding-users-to-the-workspace) to test our changes.

Workspaces are created using the Tecton CLI. Let's make one now:

Create a workspace by running `tecton workspace create MY_WORKSPACE`.

```
$ tecton workspace create MY_WORKSPACE
Created workspace "MY_WORKSPACE".
Switched to workspace "MY_WORKSPACE".

You're now on a new, empty workspace. Workspaces isolate their state,
so if you run "tecton plan" Tecton will not see any existing state
for this configuration.
```

> 💡**Tip:** For a complete list of workspace commands, simply run `tecton workspace -h`

Then, grab a reference to the new Workspace you created that we'll reference later.

In [5]:
ws = tecton.get_workspace("MY_WORKSPACE")

### ✅ Clone the Sample Feature Repo
In Tecton, a [feature repository](https://docs.tecton.ai/docs/introduction/tecton-concepts#feature-repository) is a collection of declarative Python files that define feature pipelines. In this tutorial, we'll clone a pre-populated feature repository to use as a starting point.

The [sample feature repository for this demo can be found here](https://github.com/tecton-ai-ext/tecton-snowflake-feature-repo) -- if you already checked out this git repository to get a copy of this tutorial, you should already have the important files downloaded.  If not, clone the sample repository.

### ✅ Apply the Sample Feature Repo

To register a local feature repository with Tecton, [you'll use the Tecton CLI.](https://docs.tecton.ai/examples/managing-feature-repos.html) Since you are working in a new Workspace, it does not currently have anything registered, so your first time adding features should be simple.

Navigate to the feature repository's directory in the command line:
```
cd feature_repo
```


Then run the following command to register your feature definitions with Tecton:
```
tecton apply
```


Take note of the workspace you are applying to to make sure it is correct. Then go ahead and apply the plan with `y`.

> 💡 **Tip:** You can always compare your local Feature Repo to the remote Feature Registry before applying it by running `tecton plan`.

## 1. Constructing a new feature
Let's start by building a simple feature -- **the amount of the last transaction a user made**. First, let's run a query against the raw data in Snowflake (feel free to run this yourself in a Snowflake worksheet as well).

In [6]:
# Preview the data directly
user_transaction_amount_query = '''
SELECT 
    USER_ID,
    AMT,
    TIMESTAMP
FROM 
    TECTON_DEMO_DATA.FRAUD_DEMO.TRANSACTIONS LIMIT 10 
'''
user_transaction_amount = query_snowflake(user_transaction_amount_query)
user_transaction_amount.head(5)

Unnamed: 0,USER_ID,AMT,TIMESTAMP
0,user_884240387242,65.01,2022-04-07 22:49:35.650528
1,user_502567604689,12.06,2022-04-07 22:49:41.102171
2,user_916905857181,49.53,2022-04-07 22:49:43.476484
3,user_939970169861,9.57,2022-04-07 22:49:45.248740
4,user_394495759023,3.32,2022-04-07 22:49:47.130003


In Tecton, a feature has **three key components**:
1. A set of keys that specify who or what the feature is describing (associated with an [Entity](https://docs.tecton.ai/overviews/framework/entities.html)). In the above example, the key is `USER_ID`, meaning this feature is describing a property about a user.
2. One or more feature values -- the stuff that's going to eventually get passed into a model.  In the above example, the feature is `AMT`, the amount of the transaction.
3. A timestamp for the feature value. In the above example, the timestamp is `TIMESTAMP`, signifying that the feature is valid as of the moment of the transaction.


### 1.1) Leverage existing registered Tecton objects

Our logic looks good, let's create a new feature now.  We'll first create a feature within the scope of this notebook, leveraging some of the existing Tecton objects - notably an existing data source and entity.

💡Using the tecton SDK, we can see the existing objects in this workspace. We will use the transactions data source and the user entity

In [7]:
ws.list_data_sources()

['transactions', 'users']

In [8]:
ws.list_entities()

['category', 'fraud_user', 'merchant']

### 1.2) Define a Feature using Tecton
Moving from your Snowflake query to a Tecton feature is very simple, you'll simply wrap the SQL query in a Tecton python decorator.  Here's what it looks like in practice:

```python
@batch_feature_view(
    sources=[transactions],
    entities=[user],
    mode='snowflake_sql',
    online=True,
    batch_schedule=timedelta(days=1),
    ttl=timedelta(days=30),
    feature_start_time=datetime(2021, 5, 20),
    description='Last user transaction amount (batch calculated)'
)
def user_last_transaction_amount(transactions):
    return f'''
        SELECT
            USER_ID,
            AMT,
            TIMESTAMP
        FROM
            {transactions}
        '''
```

### 1.3) Create a notebook scoped Batch Feature View (BFV)

Any Tecton object can be defined and validated in a notebook. We call these definitions "local objects". This can be helpful to help develop and validate Tecton Feature Views without needing to go back and forth between your notebook and Tecton repository.
Simply write the definition in a notebook cell and call ```.validate()``` on the object. Tecton will ensure the definition is correct and run automatic schema validations on feature pipelines.

💡 Local objects can depend on remote objects that are registered with Tecton workspaces. For example, in the below Batch Feature View, we will be use the `transactions` data source and the `fraud_user` entity that have already been pushed to Tecton.


In [9]:
# Retrieving remote objects from our workspace
transactions = ws.get_data_source('transactions')
user = ws.get_entity('fraud_user')

In [11]:
from tecton import batch_feature_view

@batch_feature_view(
    sources=[transactions],
    entities=[user],
    mode='snowflake_sql',
    online=True,
    batch_schedule=timedelta(days=1),
    ttl=timedelta(days=30),
    feature_start_time=datetime(2021, 5, 20),
    description='Last user transaction amount (batch calculated)'
)
def user_last_transaction_amount(transactions):
    return f'''
        SELECT
            USER_ID,
            AMT,
            TIMESTAMP
        FROM
            {transactions}
        '''

user_last_transaction_amount.validate()

BatchFeatureView 'user_last_transaction_amount': Validating 1 of 3 dependencies. (2 already validated)
    Transformation 'user_last_transaction_amount': Successfully validated.
BatchFeatureView 'user_last_transaction_amount': Successfully validated.


### 1.4) Test Batch Feature View (BFV) interactively
✅ Once our Feature View is validated we can test it interactively. See our documentation on interactive testing of batch features [🔗Link](https://docs.tecton.ai/docs/testing-features/interactive-testing/testing-batch-features) 

For example we might want to run the .get_historical_features() method to generate feature values for a time range.

In [13]:
start_time = datetime.utcnow()-timedelta(days=60)
end_time = datetime.utcnow()

user_last_transaction_amount.get_historical_features(start_time=start_time, end_time=end_time).to_pandas().head()

Unnamed: 0,USER_ID,AMT,TIMESTAMP
0,user_457435146833,108.22,2023-05-19 18:07:52.145171
1,user_91355675520,17.56,2023-05-19 18:07:54.858599
2,user_934384811883,164.37,2023-05-19 18:07:58.038211
3,user_939970169861,88.97,2023-05-19 18:08:00.034362
4,user_499975010057,50.53,2023-05-19 18:08:01.981921


### 1.5) Combine this new feature with a set of existing features from a workspace and create a new feature set

After creating a new feature you may want to test it in a new feature set for a model. You can do this by creating a local Feature Service object. As needed, additional features can be fetched from a workspace and added to the new Feature Service.

Commonly you may want to fetch a feature set from an existing Feature Service and add your new feature to it. You can get the list of features in a Feature Service by calling `.features` on it and then include that list in a new local Feature Service.

In [18]:
from tecton import FeatureService

feature_list = ws.get_feature_service('fraud_detection_feature_service').features
fraud_detection_v2 = FeatureService(name="fraud_detection_v2", features=feature_list + [user_last_transaction_amount])

fraud_detection_v2.validate()

FeatureService 'fraud_detection_v2': Successfully validated.


### 1.6) Generate training data to test the new feature in a model
Training data can be generated for a list of training events by calling `get_historical_features(spine=training_events)` on a Feature Service. Tecton will join in the historically accurate value of each feature for each event in the provided spine.

Feature values will be fetched from the Offline Store if they have been materialized offline and computed on the fly if not.

In [19]:
# Generating a spine from a snowflake query
transactions_query = '''
SELECT 
    MERCHANT,
    USER_ID,
    CATEGORY,
    TIMESTAMP,
    IS_FRAUD
FROM 
    TECTON_DEMO_DATA.FRAUD_DEMO.TRANSACTIONS 
ORDER BY TIMESTAMP DESC
LIMIT 100
'''

# Generating training dataset from the locally defined Feature Service.
training_data = fraud_detection_v2.get_historical_features(spine=transactions_query, timestamp_key="TIMESTAMP").to_pandas()
training_data.head(10)


Unnamed: 0,USER_ID,CATEGORY,TIMESTAMP,MERCHANT,IS_FRAUD,USER_TRANSACTION_METRICS__TRANSACTION_SUM_1D_1D,USER_TRANSACTION_METRICS__TRANSACTION_SUM_3D_1D,USER_TRANSACTION_METRICS__TRANSACTION_SUM_7D_1D,USER_TRANSACTION_METRICS__TRANSACTION_SUM_40D_1D,USER_TRANSACTION_METRICS__AMT_MEAN_1D_1D,USER_TRANSACTION_METRICS__AMT_MEAN_3D_1D,USER_TRANSACTION_METRICS__AMT_MEAN_7D_1D,USER_TRANSACTION_METRICS__AMT_MEAN_40D_1D,USER_CATEGORY_COUNT__TRANSACTION_SUM_1D_1D,USER_CATEGORY_COUNT__TRANSACTION_SUM_3D_1D,USER_CATEGORY_COUNT__TRANSACTION_SUM_7D_1D,USER_CATEGORY_COUNT__TRANSACTION_SUM_40D_1D,USER_LAST_TRANSACTION_AMOUNT__AMT
0,user_469998441571,travel,2023-07-18 17:53:14.337034,fraud_Kris-Kertzmann,0,688,2020,4771,27227,67.651526,62.464421,60.461629,60.611321,17,71,182,1124,58.19
1,user_402539845901,shopping_pos,2023-07-18 17:53:11.984440,"fraud_Daugherty, Pouros and Beahan",0,1217,3579,8429,48033,64.837543,64.965859,64.893744,64.095076,106,321,768,4305,64.81
2,user_950482239421,grocery_pos,2023-07-18 17:53:10.257271,fraud_O'Keefe-Hudson,0,400,1141,2680,15278,95.32755,92.991367,92.811571,93.779903,45,142,351,1897,77.83
3,user_644787199786,grocery_net,2023-07-18 17:53:04.876772,"fraud_Rutherford, Homenick and Bergstrom",0,514,1507,3647,20418,65.975389,61.575634,62.43045,62.410758,3,10,35,159,2.4
4,user_222506789984,travel,2023-07-18 17:53:02.477199,fraud_Fritsch LLC,0,2015,6138,14292,81450,70.966695,70.183016,69.87427,68.501804,59,196,462,2514,21.72
5,user_884240387242,grocery_pos,2023-07-18 17:52:57.596831,fraud_Kiehn-Emmerich,0,2144,6324,14524,81906,68.734664,70.914703,70.415227,72.092134,223,630,1470,8236,141.21
6,user_884240387242,entertainment,2023-07-18 17:52:55.200363,fraud_Dibbert and Sons,0,2144,6324,14524,81906,68.734664,70.914703,70.415227,72.092134,139,423,1026,5714,141.21
7,user_461615966685,entertainment,2023-07-18 17:52:52.217594,fraud_Upton PLC,0,681,2036,4763,27661,83.276637,81.325093,79.383011,81.124604,62,192,429,2363,3.22
8,user_939970169861,home,2023-07-18 17:52:50.050357,"fraud_Moore, Williamson and Emmerich",0,1310,3935,9468,54367,74.959076,64.70938,62.047396,60.493814,134,386,927,5227,49.59
9,user_939970169861,travel,2023-07-18 17:52:47.707060,"fraud_Kilback, Nitzsche and Leffler",0,1310,3935,9468,54367,74.959076,64.70938,62.047396,60.493814,47,137,303,1956,49.59


## 2. Registering the new Feature View definition with Tecton
Locally defined objects and helper functions can be copied directly into a Feature Repository for productionisation. References to remote workspace objects should be changed to local definitions in the repo

✅  To add this feature to Tecton, simply add it to a new file in your Tecton Feature Repository. **For your convenience, you can find this feature implemented (and commented out) [in this file](https://github.com/tecton-ai-ext/tecton-snowflake-feature-repo/blob/main/feature_repo/features/batch_feature_views/user_last_transaction_amount.py)**.

✅ Now that the code repo has been updated, it can be published to the Tecton. Running `tecton workspace list` should list workspaces on the server and show that the client is currently connected to the MY_WORKSPACE that was created above. If not, run `tecton workspace select MY_WORKSPACE` to connect to it. Now, run `tecton plan` to compare the code repo to the workspace, and run `tecton apply` to run the plan and prompt to push out changes. Run tecton apply and hit y at the prompt to push in the changes. The new feature should be detected and created.

<pre>
>tecton apply
Using workspace "vince-dev" on cluster https://demo-buddy.tecton.ai
✅ Imported 13 Python modules from the feature repository
✅ Imported 13 Python modules from the feature repository
⚠️  Running Tests: No tests found.
✅ Collecting local feature declarations
✅ Performing server-side feature validation: Finished generating plan.
 ↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓

  + Create Transformation
    name:           user_last_transaction_amount
    description:    Last user transaction amount (batch calculated)

  + Create Batch Feature View
    name:           user_last_transaction_amount
    description:    Last user transaction amount (batch calculated)
    warning:        This Feature View has materialization enabled, but no owner. Specifying an owner will provide a point of contact in case of an issue.
    warning:        This Feature View has online materialization enabled, but does not have monitoring configured. Use the `monitor_freshness` and `alert_email` fields to be alerted if there are issues with the materialization jobs.

 ↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
 Generated plan ID is 10f24272cddc4c63af1013dc333139c1
 View your plan in the Web UI: https://demo-buddy.tecton.ai/app/vince-dev/plan-summary/10f24272cddc4c63af1013dc333139c1
 ⚠️  Objects in plan contain warnings.

Are you sure you want to apply this plan to: "vince-dev"? [y/N]> y
🎉 all done!
</pre>



Once the apply is done, the feature has now been registered in Tecton and can be discovered by other users on the platform.  The feature can also be used for consumption, and in a live workspace, features can be materialized to offline and online stores.  

This tutorial materializes to a dev workspace, which means Feature Views will be recomputed from the data source everytime `.get_historical_features()` is called.

In [21]:
ws.list_feature_views()

['merchant_fraud_rate',
 'user_category_count',
 'user_date_of_birth',
 'user_last_transaction_amount',
 'user_transaction_metrics']

In [22]:
fv = ws.get_feature_view('user_last_transaction_amount')

start_time = datetime.utcnow()-timedelta(days=60)
end_time = datetime.utcnow()

fv.get_historical_features(start_time=start_time, end_time=end_time).to_pandas().head()

Unnamed: 0,USER_ID,AMT,TIMESTAMP
0,user_884240387242,506.19,2023-05-19 21:34:30.534352
1,user_871233292771,60.87,2023-05-19 21:34:33.086367
2,user_930691958107,3.25,2023-05-19 21:34:34.636276
3,user_26990816968,77.03,2023-05-19 21:34:37.448708
4,user_656020174537,90.45,2023-05-19 21:34:40.064020


## 3. Using Tecton time-windowed aggregations
Sliding time-windowed aggregations are common ML features for event data, but defining them in a view can be error-prone and inefficient.

Tecton provides built-in implementations of common time-windowed aggregations that simplify transformation logic and ensure correct feature value computation. Additionally, Tecton optimizes the compute and storage of these aggregations to maximize efficiency.

For these reasons, we recommend using Tecton’s built-in aggregations whenever possible.

Time-windowed aggregations can be specified in the [Batch Feature View](https://docs.tecton.ai/docs/defining-features/feature-views/batch-feature-view/#creating-features-that-use-time-windowed-aggregations) decorator using the `aggregations` and `aggregation_slide_period` parameters.

For more details on available aggregation functions, visit our [Time-window aggregation functions reference](https://docs.tecton.ai/docs/0.5/defining-features/feature-views/time-window-aggregation-functions-reference)


Tecton expects the provided SQL query to select the raw events (with timestamps) to be aggregated.

In [23]:
from tecton import Aggregation

@batch_feature_view(
    sources=[transactions],
    entities=[user],
    mode='snowflake_sql',
    online=True,
    feature_start_time=datetime(2021, 5, 20),
    description='Max transaction amounts for the user in various time windows',
    aggregation_interval=timedelta(days=1),
    aggregations=[
        Aggregation(column='AMT', function='max', time_window=timedelta(days=1)),
        Aggregation(column='AMT', function='max', time_window=timedelta(days=30)),
        Aggregation(column='AMT', function='max', time_window=timedelta(days=180)),
    ],)
def user_max_transactions(transactions):
    return f'''
        SELECT
            USER_ID,
            AMT,
            TIMESTAMP
        FROM
            {transactions}
        '''

user_max_transactions.validate()

BatchFeatureView 'user_max_transactions': Validating 1 of 3 dependencies. (2 already validated)
    Transformation 'user_max_transactions': Successfully validated.
BatchFeatureView 'user_max_transactions': Successfully validated.


In [24]:
user_max_transactions.get_historical_features(start_time=start_time, end_time=end_time).to_pandas().head()

Unnamed: 0,USER_ID,TIMESTAMP,AMT_MAX_1D_1D,AMT_MAX_30D_1D,AMT_MAX_180D_1D
0,user_650387977076,2023-06-02,2604.73,5786.68,5786.68
1,user_91355675520,2023-06-07,710.79,1224.23,1224.23
2,user_609904782486,2023-06-30,1223.23,4223.93,4223.93
3,user_722584453020,2023-06-23,1269.26,7141.93,7141.93
4,user_917975462998,2023-06-24,2939.5,2939.5,2939.5



✅  To add this feature to Tecton, simply add it to a new file in your Tecton Feature Repository. **For your convenience, you can find this feature implemented (and commented out) [in this file](https://github.com/tecton-ai-ext/tecton-snowflake-feature-repo/blob/main/feature_repo/features/batch_feature_views/user_max_transactions.py)**.

✅  Once you save your new feature, run `tecton apply` to publish it to Tecton.

Now we can test this feature below.

In [None]:
fv = ws.get_feature_view('user_max_transactions')

start_time = datetime.combine(date.today()-timedelta(days=180), datetime.min.time())
end_time = datetime.combine(date.today(), datetime.min.time())

fv.run(start_time=start_time, end_time=end_time).to_pandas().fillna(0).head()