# 1. Introduction to Tecton on Snowflake

## 1) Setup

Before getting started, let's do some setup to get your computer ready to interact with Tecton.

### 1.1) Install the Tecton CLI on your local machine

<div class="alert alert-block alert-warning">
Tecton requires Python version 3.8 to run. We also recommend installing tecton into a Python virtual environment.
</div>

To install the Tecton CLI and other dependencies on your local machine, run the following command in this folder:

✅ `$ pip install -r requirements.txt`

If you run into any issues, follow [these instructions in the Tecton Docs](https://docs.tecton.ai/v2/setting-up-tecton/02-tecton-cli-setup.html) to set up the Tecton CLI.

Once you have finished installing the CLI, you can log in to your Tecton cluster using the following command. (Replace `<your-cluster>` with your cluster name).

✅ `$ tecton login <your-cluster>.tecton.ai`

### 1.2) Clone the Tecton sample repository

This tutorial uses [a sample repository full of pre-built features and data sources](https://github.com/tecton-ai-ext/tecton-snowflake-feature-repo).

Before you get started, clone this repository to your local machine using:

✅ `$ git clone https://github.com/tecton-ai-ext/tecton-snowflake-feature-repo.git`

### 1.3) Configure your environment with Snowflake credentials

You'll need to set three environment variables to connect to Snowflake.

* SNOWFLAKE_USER: your username in the Snowflake account that you're using with Tecton
* SNOWFLAKE_PASSWORD: your password in in the Snowflake account that you're using with Tecton
* SNOWFLAKE_ACCOUNT: the Snowflake account you're using with Tecton (takes the form \<SNOWFLAKE_ACCOUNT\>.snowflakecomputing.com

We recommend storing these environment variables in a file named `.env`

To create this file, uncomment the cell below or create it manually in the same directory as this notebook.

In [8]:
# %%writefile .env
# SNOWFLAKE_USER=<YOUR_SNOWFLAKE_USER>
# SNOWFLAKE_PASSWORD=<YOUR_SNOWFLAKE_PASSWORD>
# SNOWFLAKE_ACCOUNT=<SNOWFLAKE_ACCOUNT>

### 1.4) Import some packages and verify that Tecton is installed

✅ Run the cell below. It will infer the Snowflake credentials from the configuration you set earlier.

In [9]:
# Import Tecton and other libraries
import logging
import os
import tecton
from dotenv import load_dotenv
import pandas as pd
import snowflake.connector
from datetime import datetime, timedelta
from pprint import pprint

load_dotenv()  # take environment variables from .env.
logging.getLogger('snowflake.connector').setLevel(logging.WARNING)
logging.getLogger('snowflake.snowpark').setLevel(logging.WARNING)

connection_parameters = {
    "user": os.environ['SNOWFLAKE_USER'],
    "password": os.environ['SNOWFLAKE_PASSWORD'],
    "account": os.environ['SNOWFLAKE_ACCOUNT'],
    "warehouse": "TRIAL_WAREHOUSE",
    # Database and schema are required to create various temporary objects by tecton
    "database": "TECTON",
    "schema": "PUBLIC",
}
conn = snowflake.connector.connect(**connection_parameters)
tecton.snowflake_context.set_connection(conn) # Tecton will use this Snowflake connection for all interactive queries


# Quick helper function to query snowflake from a notebook
# Make sure to replace with the appropriate connection details for your own account
def query_snowflake(query):
    df = conn.cursor().execute(query).fetch_pandas_all()
    return df

tecton.version.summary()

Version: 0.4.0b22
Git Commit: e7486661deb9d2d8cb26bbddecc3f9201197323e
Build Datetime: 2022-04-14T18:22:37


## 2) Interact with Tecton
Your Tecton account has been seeded with data and some example features that you can use to test out Tecton.

First, you can explore some of the raw data that has been connected to Tecton -- historical transactions.  You'll notice we first select the [Tecton workspace](https://docs.tecton.ai/overviews/workspaces.html) `prod`, which contains the objects we want to fetch.


In [10]:
# Get the "transactions" data source from Snowflake, and display a summary of the data source.
ws = tecton.get_workspace('prod')
ds = ws.get_data_source('transactions')
ds.summary()

Unnamed: 0,Unnamed: 1
Name,transactions
Workspace,prod
Description,
Created At,2022-04-13 23:23:44 UTC
Owner,
Last Modified By,david@tecton.ai
Family,
Source Filename,data_sources/transactions.py
Tags,{}
Batch Data Source,Type Snowflake URL Database TECTON_DEMO_DATA Schema FRAUD_DEMO


### 2.1) Preview the raw data from the data source

In [11]:
transactions_query = '''
SELECT 
    *
FROM 
    TECTON_DEMO_DATA.FRAUD_DEMO.TRANSACTIONS 
ORDER BY TIMESTAMP DESC
LIMIT 50
'''
transactions = query_snowflake(transactions_query)
transactions.head(5)

Unnamed: 0,USER_ID,TRANSACTION_ID,CATEGORY,AMT,IS_FRAUD,MERCHANT,MERCH_LAT,MERCH_LONG,TIMESTAMP
0,user_884240387242,4514e656ed504fa972829d662cfe82e8,food_dining,79.53,0,fraud_Zboncak LLC,29.363853,-82.849269,2022-04-15 19:18:45.498749
1,user_459842889956,198b2a199c6faaa73c3065423c614540,gas_transport,77.96,0,"fraud_Olson, Becker and Koch",40.571016,-97.360469,2022-04-15 19:18:43.881500
2,user_461615966685,285225af223c2ddafd51a3e3ca907444,food_dining,61.62,0,"fraud_Turner, Ziemann and Lehner",35.778243,-88.354864,2022-04-15 19:18:40.480535
3,user_499975010057,ee63cd81a62cc557094b751bcdd1dbf0,grocery_pos,110.28,0,fraud_Auer-Mosciski,40.272594,-75.064409,2022-04-15 19:18:37.213254
4,user_644787199786,ec9a569a1a0102ff1cd3f48d7f990b35,misc_pos,6.08,0,fraud_McCullough LLC,27.056076,-81.490004,2022-04-15 19:18:34.963195


### 2.2) Tecton Feature Views

In Tecton, features are registered as [Feature Views](https://docs.tecton.ai/overviews/framework/feature_views/feature_views.html). These views contain all of the information needed to transform raw data from data sources into features.
 
Feature Views can make feature data available in two places:
* Offline: You can retrieve historical feature values using [time travel](https://www.tecton.ai/blog/time-travel-in-ml/).
* Online: You can retrieve current feature values in real time via Tecton's [real-time serving API](https://docs.tecton.ai/examples/fetch-real-time-features.html).

Feature Views can also be run ad-hoc for testing or previewing data using `.run()`. Let's run the `merchant_fraud_rate` Feature View to get the percentage of historical transactions per merchant, over the last 30 days, that are fraudulant. After running the feature view, let's sort by the merchants with the highest fraud rate.

In [4]:
fv = ws.get_feature_view('merchant_fraud_rate')

start_time = datetime.utcnow()-timedelta(days=30)
end_time = datetime.utcnow()

features = fv.run(feature_start_time=start_time, feature_end_time=end_time).to_pandas()

features.sort_values(by="IS_FRAUD_MEAN_72H_1D", ascending=False).head(5)

Unnamed: 0,MERCHANT,TIMESTAMP,IS_FRAUD_MEAN_24H_1D,IS_FRAUD_MEAN_72H_1D,IS_FRAUD_MEAN_168H_1D,IS_FRAUD_MEAN_960H_1D
807,fraud_Kessler Group,1649980800000000000,0.1,0.1,0.1,0.1
5895,fraud_Shields Inc,1649721600000000000,0.090909,0.090909,0.090909,0.090909
3155,"fraud_Greenholt, O'Hara and Balistreri",1650067200000000000,0.083333,0.083333,0.083333,0.083333
3783,fraud_Nader-Heller,1650067200000000000,0.083333,0.083333,0.083333,0.083333
1223,fraud_McGlynn-Heathcote,1650067200000000000,0.083333,0.083333,0.083333,0.083333


## 3) Generate Training Data
Once you've built some features, you'll want to join them together to generate training data. 

### 3.1) Tecton Feature Services
In Tecton, features that are needed for training or predictions are grouped together into a [Feature Service](https://docs.tecton.ai/overviews/framework/feature_services.html). Typically you have one FeatureService per ML model. Let's explore a Feature Service that we've already built.

In [5]:
fs = ws.get_feature_service('fraud_detection_feature_service')
fs.summary()

Property,Value
Name,fraud_detection_feature_service
Owner,
Last Updated By,david@tecton.ai
Description,
Family,
Entities,"['fraud_user', 'category']"
Online Serving,Enabled
Logging,Disabled
Online Join Keys,"USER_ID, CATEGORY"
Offline Join Keys,"USER_ID, CATEGORY"


The `fraud_detection_feature_service` is comprised of 13 features that can be used together to train a fraud detection model.

### 3.2) Build a Spine

Let's use the `fraud_detection_feature_service` to train a model that scores transactions as either fraudulent or non-fraudulent. To start, let's look up some labeled transactions that we'll use for training.

We can see in the summary above that the `fraud_detection_feature_service` requires `MERCHANT`, `USER_ID`, and `CATEGORY` join keys to fetch all the relevant features. Together with an event timestamp and label column, this represents our list of historical training events. In Tecton we call this a "spine".

> 💡 A spine is expected to include the entity join keys for the Feature Views in a Feature Service as well as a timestamp column for time-travel lookups. A label column is not strictly necessary but is typically included if you generate a training dataset.

In [6]:
# Preview the data directly
transactions_query = '''
SELECT 
    MERCHANT,
    USER_ID,
    CATEGORY,
    TIMESTAMP,
    IS_FRAUD
FROM 
    TECTON_DEMO_DATA.FRAUD_DEMO.TRANSACTIONS 
ORDER BY TIMESTAMP DESC
LIMIT 1000
'''
transactions = query_snowflake(transactions_query)
transactions.head(5)

Unnamed: 0,MERCHANT,USER_ID,CATEGORY,TIMESTAMP,IS_FRAUD
0,fraud_Zboncak LLC,user_884240387242,food_dining,2022-04-15 19:18:45.498749,0
1,"fraud_Olson, Becker and Koch",user_459842889956,gas_transport,2022-04-15 19:18:43.881500,0
2,"fraud_Turner, Ziemann and Lehner",user_461615966685,food_dining,2022-04-15 19:18:40.480535,0
3,fraud_Auer-Mosciski,user_499975010057,grocery_pos,2022-04-15 19:18:37.213254,0
4,fraud_McCullough LLC,user_644787199786,misc_pos,2022-04-15 19:18:34.963195,0


### 3.3) Get Training Data with `get_historical_features`

To retrieve training data, we'll use Tecton's `get_historical_features` API, which allows us to join the 13 features contained in `fraud_detection_feature_service` onto our historical transactions.


`get_historical_features` expects a spine in the form of a Pandas Dataframe or a Snowflake query.

In [7]:
training_data = fs.get_historical_features(spine=transactions_query, timestamp_key="TIMESTAMP").to_pandas()
training_data.head(10)

Unnamed: 0,USER_ID,CATEGORY,TIMESTAMP,MERCHANT,IS_FRAUD,USER_TRANSACTION_METRICS__TRANSACTION_SUM_24H_1D,USER_TRANSACTION_METRICS__TRANSACTION_SUM_72H_1D,USER_TRANSACTION_METRICS__TRANSACTION_SUM_168H_1D,USER_TRANSACTION_METRICS__TRANSACTION_SUM_960H_1D,USER_TRANSACTION_METRICS__AMT_MEAN_24H_1D,USER_TRANSACTION_METRICS__AMT_MEAN_72H_1D,USER_TRANSACTION_METRICS__AMT_MEAN_168H_1D,USER_TRANSACTION_METRICS__AMT_MEAN_960H_1D,USER_CATEGORY_COUNT__TRANSACTION_SUM_24H_1D,USER_CATEGORY_COUNT__TRANSACTION_SUM_72H_1D,USER_CATEGORY_COUNT__TRANSACTION_SUM_168H_1D,USER_CATEGORY_COUNT__TRANSACTION_SUM_960H_1D
0,user_205125746682,health_fitness,2022-04-15 19:18:22.850551,fraud_Ratke and Sons,0,524,1554,3663,3678,59.157786,58.612664,60.713366,60.590965,32,101,255,256
1,user_939970169861,misc_net,2022-04-15 19:09:19.272558,fraud_Kunde-Sanford,0,1378,4146,9694,9739,60.051168,65.578324,64.497914,64.419719,86,247,565,567
2,user_469998441571,gas_transport,2022-04-15 19:04:06.229037,fraud_Parisian and Sons,0,655,2104,4953,4984,68.744336,64.616245,62.753725,62.647103,107,325,716,718
3,user_460877961787,food_dining,2022-04-15 18:58:14.283979,fraud_Bechtelar-Rippin,0,520,1550,3579,3599,62.520981,60.992697,59.309944,59.381448,45,121,283,284
4,user_724235628997,misc_pos,2022-04-15 18:41:23.688942,"fraud_Wintheiser, Dietrich and Schimmel",0,1186,3630,8491,8540,64.722639,62.765975,64.833205,64.831999,69,225,531,536
5,user_469998441571,misc_net,2022-04-15 18:47:22.151242,fraud_Durgan-Auer,0,655,2104,4953,4984,68.744336,64.616245,62.753725,62.647103,42,134,308,311
6,user_650387977076,grocery_net,2022-04-15 19:17:52.558245,fraud_Cummings Group,0,1527,4594,10803,10872,70.688991,75.248888,73.206116,73.183302,92,275,640,645
7,user_884240387242,kids_pets,2022-04-15 19:14:02.879446,fraud_Bogisich-Weimann,0,2155,6180,14538,14629,76.234529,72.071833,73.434384,73.395734,190,568,1301,1306
8,user_687958452057,food_dining,2022-04-15 19:05:24.870913,"fraud_Feil, Hilpert and Koss",0,878,2603,6154,6188,97.17303,94.237,94.669072,94.645422,73,212,495,499
9,user_699668125818,grocery_pos,2022-04-15 19:03:04.426699,fraud_McDermott-Weimann,0,633,1965,4793,4821,71.724202,65.750656,64.716831,64.742211,61,187,484,486


### What is happening behind the scenes

Behind the scenes, Tecton is doing a row-level, [point-in-time correct](https://www.tecton.ai/blog/time-travel-in-ml/) join.  This join logic helps you ensure that the data you use to train your models is drawn from the same distribution as the data that is likely to be used at production time.

One other helpful thing -- you never need to worry about different concepts of time in your data when generating training data. For each feature you can specify the most convenient or correct time for that feature, and Tecton's join logic will make it easy to join all of your features together.

<img src="https://docs.tecton.ai/v2/assets/docs/examples/point-in-time-correct-joins.png" width="50%" />

## 4) Get Real-Time Features for Inference

### 4.1) Generate an API token

To fetch real-time (online) features at low latency for a production application we will use Tecton's REST API.

This will require creating an API key. In your terminal, run:

✅ `$ tecton api-key create`

Then set this API key as an enviornment variable using the line below and replacing "<key>" with the generate API key:

✅ `$ export TECTON_API_KEY=<key>`

### 4.2) Retrieve online features using the Python SDK

We can hit Tecton's REST API directly from the Python SDK using `fs.get_online_features(keys)`. This is convenient for testing purposes.

✅ To query the REST API from the Python SDK, we need to set the API key in the first line of the cell below. Replace "\<key>" with the token generated in the step above.

In [None]:
tecton.conf.set("TECTON_API_KEY", "<key>")

keys = {
    # 'MERCHANT': 'fraud_Gutmann Ltd',
    'USER_ID': 'user_461615966685',
    'CATEGORY': 'grocery_net'
}
features = fs.get_online_features(join_keys=keys).to_dict()
pprint(features)

### 4.3) Retrieve features directly from the REST API via a cURL

We can also directly query Tecton's REST API using the example cURL below.

✅ Run this in your terminal, but make sure to replace `<your-cluster>` cluster name in the first line with your cluster name:

```bash
curl -X POST --silent https://<your-cluster>.tecton.ai/api/v1/feature-service/get-features\
     -H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
  "params": {
    "feature_service_name": "fraud_detection_feature_service",
    "join_key_map": {
      "USER_ID": "user_461615966685",
      "CATEGORY": "grocery_net"
    },
    "workspace_name": "prod"
  }
}' | jq
```

# What's Next

Tecton is a powerful tool to build, manage, share, and consume features for ML.  Check out the next tutorial "Creating Features on Snowflake" to learn how to build your own features.