# 2. Building Features with Tecton and Snowflake

In this tutorial we'll cover how you can use Tecton and Snowflake to build features for machine learning.  We'll cover:
* How to register features with Tecton
* How features are written in Tecton
* How to use Tecton Aggregations to do easy window aggregations

### ❓ Before we start -- Tecton Workspaces

[Workspaces](https://docs.tecton.ai/overviews/workspaces.html) are like a sandbox environment that can be used for experimenting with a Feature Repo without affecting the production environment. Changes made in one workspace will have no affect on other Workspaces.

By default, new "development" workspaces do not have access to materialization and storage resources. Instead, transformations can be ran ad-hoc in your Snowflake Warehouse.

This ad-hoc computation functionality can be used in any workspace and allows you to easily test features without needing to backfill and materialize data to the Feature Store.

New workspaces with full materialization and storage resources can be created with the addition of the _--live_ flag during create time in the below CLI command. This can be useful for creating staging environments for testing features online before pushing changes to prod, or for creating isolation between different teams.

**In this tutorial, we'll create a new workspace to ensure our changes don't effect other's workloads**

### ✅ Create your own Tecton Workspace

Workspaces are created using the Tecton CLI. Let's make one now:

Create a workspace by running `tecton workspace create YOUR_NAME`.

```
$ tecton workspace create YOUR_NAME
Created workspace "YOUR_NAME".
Switched to workspace "YOUR_NAME".

You're now on a new, empty workspace. Workspaces isolate their state,
so if you run "tecton plan" Tecton will not see any existing state
for this configuration.
```

> 💡**Tip:** For a complete list of workspace commands, simply run `tecton workspace -h`

### ❓ Before we start -- Tecton Feature Repos

In Tecton, [features are declared as code](https://docs.tecton.ai/examples/managing-feature-repos.html), in a **Tecton Feature Repository**. When your team uses Tecton, in practice you'll be collaborating on a code repository that defines all of the features that you expect Tecton to manage.

That means before we build a new feature, we'll need to clone the code repository that your team will use to collaborate on features.  **In this tutorial, we'll clone a pre-populated feature repository to use as a starting point**

### ✅ Clone the Sample Feature Repo

The [sample feature repository for this demo can be found here](https://github.com/tecton-ai-ext/tecton-snowflake-feature-repo) -- if you already checked out this git repository to get a copy of this tutorial, you should already have the important files downloaded.  If not, clone the sample repository -- in the next steps you'll be editing files in that repo.

### ✅ Apply the Sample Feature Repo

To register a local feature repository with Tecton, [you'll use the Tecton CLI.](https://docs.tecton.ai/examples/managing-feature-repos.html) Since you are working in a new Workspace, it does not currently have anything registered, so your first time adding features should be simple.

Navigate to the feature repository in a command line, and run:
```
tecton apply
```


Take note of the workspace you are applying to to make sure it is correct. Then go ahead and apply the plan with `y`.

> 💡 **Tip:** You can always compare your local Feature Repo to the remote Feature Registry by running `tecton plan`.

# Building your first feature

On to the fun part, let's build a feature in Tecton.

### Setup

✅ Run the cell below, but make sure to replace the Snowflake connection parameters with your account info.

In [2]:
import tecton
import pandas as pd
import snowflake.connector
from dotenv import dotenv_values
from datetime import datetime, timedelta
from pprint import pprint

env = {**dotenv_values(".env")}

# Quick helper function to query snowflake from a notebook
# Make sure to replace with the appropriate connection details for your own account
def query_snowflake(query):
    conn = snowflake.connector.connect(
        user=env['SNOWFLAKE_USER'],
        password=env['SNOWFLAKE_PWD'],
        account=env['SNOWFLAKE_ACCOUNT'],
        warehouse=env['SNOWFLAKE_WAREHOUSE']
    )
    df = conn.cursor().execute(query).fetch_pandas_all()
    return df

tecton.version.summary()

Version: 0.4.0b16
Git Commit: 7627b2f1c1a965a458440f3ce900a907671f2185
Build Datetime: 2022-04-05T23:41:41


## Constructing a Feature
Let's start by building a simple feature -- **the amount of the last transaction a user made**. First, let's run a query against the raw data in Snowflake (feel free to run this yourself in a Snowflake worksheet as well).

In [3]:
# Preview the data directly
user_transaction_amount_query = '''
SELECT 
    USER_ID,
    AMT,
    TIMESTAMP
FROM 
    TECTON_DEMO_DATA.FRAUD_DEMO.TRANSACTIONS 
'''
user_transaction_amount = query_snowflake(user_transaction_amount_query)
user_transaction_amount.head(5)

INFO - 04/06/2022 04:17:57 PM - snowflake.connector.connection - Snowflake Connector for Python Version: 2.7.6, Python Version: 3.8.13, Platform: macOS-12.2.1-x86_64-i386-64bit
INFO - 04/06/2022 04:17:57 PM - snowflake.connector.connection - This connection is in OCSP Fail Open Mode. TLS Certificates would be checked for validity and revocation status. Any other Certificate Revocation related exceptions or OCSP Responder failures would be disregarded in favor of connectivity.
INFO - 04/06/2022 04:17:57 PM - snowflake.connector.connection - Setting use_openssl_only mode to False
INFO - 04/06/2022 04:17:58 PM - snowflake.connector.cursor - query: [SELECT USER_ID, AMT, TIMESTAMP FROM TECTON_DEMO_DATA.FRAUD_DEMO.TRANSACTIONS]
INFO - 04/06/2022 04:17:59 PM - snowflake.connector.cursor - query execution done


Unnamed: 0,USER_ID,AMT,TIMESTAMP
0,user_699668125818,47.98,2022-03-28 17:47:59.847095
1,user_457435146833,64.81,2022-03-28 17:48:01.431271
2,user_855115135598,15.69,2022-03-28 17:48:03.424751
3,user_934384811883,5.86,2022-03-28 17:48:05.245742
4,user_650387977076,122.57,2022-03-28 17:48:07.229087


In Tecton, a feature has **three key components**:
1. A set of keys that specify who or what the feature is describing (associated with an [Entity](https://docs.tecton.ai/overviews/framework/entities.html)). In the above example, the key is `USER_ID`, meaning this feature is describing a property about a user.
2. One or more feature values -- the stuff thats going to eventually get passed into a model.  In the above example, the feature value is `AMT`, the amount of the transaction.
3. A timestamp for the feature value. In the above example, the timestamp is `TIMESTAMP`, signifying that the feature is valid as of the moment of the transaction.


## Defining a Feature to Tecton
Moving from your Snowflake query to a Tecton feature is very simple, you'll simply wrap the SQL query in a Tecton decorator.  Here's what it looks like in practice:

```python
@batch_feature_view(
    sources=[transactions],
    entities=[user],
    mode='snowflake_sql',
    online=True,
    batch_schedule='1d',
    ttl='30days',
    feature_start_time=datetime(2021, 5, 20),
    description='Last user transaction amount (batch calculated)'
)
def user_last_transaction_amount(transactions):
    return f'''
        SELECT
            USER_ID,
            AMT,
            TIMESTAMP
        FROM
            {transactions}
        '''
```

✅  To add this feature to Tecton, simply add it to a new file in your Tecton Feature Repository. **For your convenience, you can find this feature implemented (and commented out) [in this file](feature_repo/features/batch_feature_views/user_last_transaction_amount.py)**.

✅  Once you save your new feature, run `tecton apply` to publish it to Tecton.

Currently this feature has online materialziation disabled. If the `online=True` flag is enabled when the Feature View is applied, Tecton will automatically backfill feature data to the online store from the specified `feature_start_time` until now, and then every `batch_schedule` interval going forward.

As shown in the last tutorial, we can test run this new Feature view using the `.run()` function below.

In [None]:
ws = tecton.get_workspace('YOUR_NAME') # replace with your workspace name
fv = ws.get_feature_view('user_last_transaction_amount')

start_time = datetime.utcnow()-timedelta(days=30)
end_time = datetime.utcnow()

fv.run(feature_start_time=start_time, feature_end_time=end_time).to_pandas().head()

## Using Tecton time-windowed aggregations
Sliding time-windowed aggregations are common ML features for event data, but defining them in a view can be error-prone and inefficient.

Tecton provides built-in implementations of common time-windowed aggregations that simplify transformation logic and ensure correct feature value computation. Additionally, Tecton optimizes the compute and storage of these aggregations to maximize efficiency.

For these reasons, we recommend using Tecton’s built-in aggregations whenever possible.

Time-windowed aggregations can be specified in the Batch Feature View decorator using the `aggregations` and `aggregation_slide_period` parameters.

Tecton will expect the provided SQL query to select the raw events (with timestamps) to be aggregated.

```python
@batch_feature_view(
    sources=[transactions],
    entities=[user],
    mode='snowflake_sql',
    online=True,
    feature_start_time=datetime(2021, 5, 20),
    description='Max transaction amounts for the user in various time windows',
    aggregation_slide_period='1d',
    aggregations=[FeatureAggregation(column='AMT', function='max', time_windows=['1d','3d','7d'])],
)
def user_max_transactions(transactions):
    return f'''
        SELECT
            USER_ID,
            AMT,
            TIMESTAMP
        FROM
            {transactions}
        '''
```

✅  To add this feature to Tecton, simply add it to a new file in your Tecton Feature Repository. **For your convenience, you can find this feature implemented (and commented out) [in this file](feature_repo/features/batch_feature_views/user_max_transactions.py)**.

✅  Once you save your new feature, run `tecton apply` to publish it to Tecton.

Now we can test this feature below.

In [None]:
fv = ws.get_feature_view('user_max_transactions')

start_time = datetime.utcnow()-timedelta(days=30)
end_time = datetime.utcnow()

fv.run(feature_start_time=start_time, feature_end_time=end_time).to_pandas().fillna(0).head()

If you want to add these features to your feature set for your model, simply extend the list of Feature Views in your [Feature Service](feature_repo/feature_services/fraud_detection.py).