# 2. Building Features with Tecton and Snowflake

In this tutorial we'll cover how you can use Tecton and Snowflake to build features for machine learning.  We'll cover:
* How to register features with Tecton
* How features are written in Tecton
* How to use Tecton Aggregations to do easy window aggregations

## Setup

✅ Run the cells below.

In [None]:
import logging
import os
import tecton
from dotenv import load_dotenv
import pandas as pd
import snowflake.connector
from datetime import date, datetime, timedelta
from pprint import pprint

In [None]:
load_dotenv()  # take environment variables from .env.
logging.getLogger('snowflake.connector').setLevel(logging.WARNING)
logging.getLogger('snowflake.snowpark').setLevel(logging.WARNING)

connection_parameters = {
    "user": os.environ['SNOWFLAKE_USER'],
    "password": os.environ['SNOWFLAKE_PASSWORD'],
    "account": os.environ['SNOWFLAKE_ACCOUNT'],
    "warehouse": "TRIAL_WAREHOUSE",
    # Database and schema are required to create various temporary objects by tecton
    "database": "TECTON",
    "schema": "PUBLIC",
}
conn = snowflake.connector.connect(**connection_parameters)
tecton.snowflake_context.set_connection(conn) # Tecton will use this Snowflake connection for all interactive queries

# Quick helper function to query snowflake from a notebook
# Make sure to replace with the appropriate connection details for your own account
def query_snowflake(query):
    df = conn.cursor().execute(query).fetch_pandas_all()
    return df

ws = tecton.get_workspace('prod')
tecton.version.summary()

### ❓ Before we start -- Tecton Workspaces

[Workspaces](https://docs.tecton.ai/overviews/workspaces.html) are like a sandbox environment that can be used for experimenting with a Feature Repo without affecting the production environment. Changes made in one workspace will have no affect on other Workspaces.

By default, new "development" workspaces do not have access to materialization and storage resources. Instead, transformations can be run ad-hoc in your Snowflake Warehouse. This means that the Tecton SDK builds a query that reads directly from your raw data tables, and executes it in your Snowflake Warehouse.

This ad-hoc computation functionality can be used in any workspace and allows you to easily test features without needing to backfill and materialize data to the Feature Store.

New workspaces with full materialization and storage resources can be created with the addition of the _--live_ flag during create time in the below CLI command. This can be useful for creating staging environments for testing features online before pushing changes to prod, or for creating isolation between different teams.

**In this tutorial, we'll create a new workspace to ensure our changes don't effect other's workloads**

### ✅ Create your own Tecton Workspace
In this tutorial, we'll create a new [Workspace](https://docs.tecton.ai/docs/setting-up-tecton/administration-setup/creating-a-workspace-and-adding-users-to-the-workspace) to test our changes.

Workspaces are created using the Tecton CLI. Let's make one now:

Create a workspace by running `tecton workspace create YOUR_NAME`.

```
$ tecton workspace create YOUR_NAME
Created workspace "YOUR_NAME".
Switched to workspace "YOUR_NAME".

You're now on a new, empty workspace. Workspaces isolate their state,
so if you run "tecton plan" Tecton will not see any existing state
for this configuration.
```

> 💡**Tip:** For a complete list of workspace commands, simply run `tecton workspace -h`

Then, grab a reference to the new Workspace you created that we'll reference later.

In [None]:
ws = tecton.get_workspace("YOUR_NAME")

### ✅ Clone the Sample Feature Repo
In Tecton, a [feature repository](https://docs.tecton.ai/docs/introduction/tecton-concepts#feature-repository) is a collection of declarative Python files that define feature pipelines. In this tutorial, we'll clone a pre-populated feature repository to use as a starting point.

The [sample feature repository for this demo can be found here](https://github.com/tecton-ai-ext/tecton-snowflake-feature-repo) -- if you already checked out this git repository to get a copy of this tutorial, you should already have the important files downloaded.  If not, clone the sample repository.

### ✅ Apply the Sample Feature Repo

To register a local feature repository with Tecton, [you'll use the Tecton CLI.](https://docs.tecton.ai/examples/managing-feature-repos.html) Since you are working in a new Workspace, it does not currently have anything registered, so your first time adding features should be simple.

Navigate to the feature repository's directory in the command line:
```
cd feature_repo
```


Then run the following command to register your feature definitions with Tecton:
```
tecton apply
```


Take note of the workspace you are applying to to make sure it is correct. Then go ahead and apply the plan with `y`.

> 💡 **Tip:** You can always compare your local Feature Repo to the remote Feature Registry before applying it by running `tecton plan`.

## Constructing a Feature
Let's start by building a simple feature -- **the amount of the last transaction a user made**. First, let's run a query against the raw data in Snowflake (feel free to run this yourself in a Snowflake worksheet as well).

In [None]:
# Preview the data directly
user_transaction_amount_query = '''
SELECT 
    USER_ID,
    AMT,
    TIMESTAMP
FROM 
    TECTON_DEMO_DATA.FRAUD_DEMO.TRANSACTIONS LIMIT 10 
'''
user_transaction_amount = query_snowflake(user_transaction_amount_query)
user_transaction_amount.head(5)

In Tecton, a feature has **three key components**:
1. A set of keys that specify who or what the feature is describing (associated with an [Entity](https://docs.tecton.ai/overviews/framework/entities.html)). In the above example, the key is `USER_ID`, meaning this feature is describing a property about a user.
2. One or more feature values -- the stuff that's going to eventually get passed into a model.  In the above example, the feature is `AMT`, the amount of the transaction.
3. A timestamp for the feature value. In the above example, the timestamp is `TIMESTAMP`, signifying that the feature is valid as of the moment of the transaction.


## Defining a Feature to Tecton
Moving from your Snowflake query to a Tecton feature is very simple, you'll simply wrap the SQL query in a Tecton python decorator.  Here's what it looks like in practice:

```python
@batch_feature_view(
    sources=[transactions],
    entities=[user],
    mode='snowflake_sql',
    online=True,
    batch_schedule=timedelta(days=1),
    ttl=timedelta(days=30),
    feature_start_time=datetime(2021, 5, 20),
    description='Last user transaction amount (batch calculated)'
)
def user_last_transaction_amount(transactions):
    return f'''
        SELECT
            USER_ID,
            AMT,
            TIMESTAMP
        FROM
            {transactions}
        '''
```

✅  To add this feature to Tecton, simply add it to a new file in your Tecton Feature Repository. **For your convenience, you can find this feature implemented (and commented out) [in this file](https://github.com/tecton-ai-ext/tecton-snowflake-feature-repo/blob/main/feature_repo/features/batch_feature_views/user_last_transaction_amount.py)**.

✅  Once you save your new feature, run `tecton apply` to publish it to Tecton.


In [None]:
ws.list_feature_views()

In [None]:
fv = ws.get_feature_view('user_last_transaction_amount')

start_time = datetime.utcnow()-timedelta(days=60)
end_time = datetime.utcnow()

fv.run(start_time=start_time, end_time=end_time).to_pandas().head()

## Using Tecton time-windowed aggregations
Sliding time-windowed aggregations are common ML features for event data, but defining them in a view can be error-prone and inefficient.

Tecton provides built-in implementations of common time-windowed aggregations that simplify transformation logic and ensure correct feature value computation. Additionally, Tecton optimizes the compute and storage of these aggregations to maximize efficiency.

For these reasons, we recommend using Tecton’s built-in aggregations whenever possible.

Time-windowed aggregations can be specified in the [Batch Feature View](https://docs.tecton.ai/docs/defining-features/feature-views/batch-feature-view/#creating-features-that-use-time-windowed-aggregations) decorator using the `aggregations` and `aggregation_slide_period` parameters.

Tecton expects the provided SQL query to select the raw events (with timestamps) to be aggregated.

```python
@batch_feature_view(
    sources=[transactions],
    entities=[user],
    mode='snowflake_sql',
    online=True,
    feature_start_time=datetime(2021, 5, 20),
    description='Max transaction amounts for the user in various time windows',
    aggregation_interval=timedelta(days=1),
    aggregations=[
        Aggregation(column='AMT', function='max', time_window=timedelta(days=1)),
        Aggregation(column='AMT', function='max', time_window=timedelta(days=30)),
        Aggregation(column='AMT', function='max', time_window=timedelta(days=180)),
    ],)
def user_max_transactions(transactions):
    return f'''
        SELECT
            USER_ID,
            AMT,
            TIMESTAMP
        FROM
            {transactions}
        '''
```

✅  To add this feature to Tecton, simply add it to a new file in your Tecton Feature Repository. **For your convenience, you can find this feature implemented (and commented out) [in this file](https://github.com/tecton-ai-ext/tecton-snowflake-feature-repo/blob/main/feature_repo/features/batch_feature_views/user_max_transactions.py)**.

✅  Once you save your new feature, run `tecton apply` to publish it to Tecton.

Now we can test this feature below.

In [None]:
fv = ws.get_feature_view('user_max_transactions')

start_time = datetime.combine(date.today()-timedelta(days=180), datetime.min.time())
end_time = datetime.combine(date.today(), datetime.min.time())

fv.run(start_time=start_time, end_time=end_time).to_pandas().fillna(0).head()