# 1. Introduction to Tecton on Snowflake

## 1) Setup

Before getting started, lets do some setup to get your computer ready to interact with Tecton

### 1.1) Install the Tecton CLI on your local machine

<div class="alert alert-block alert-warning">
Tecton requires Python version 3.8 to run. We also recommend installing tecton into a Python virtual environment.
</div>

To install the Tecton CLI and other dependencies on your local machine, run the following command in this folder:

✅ `$ pip install -r requirements.txt`

If you run into any issues, follow [these instructions in the Tecton Docs](https://docs.tecton.ai/docs/setting-up-tecton/development-setup/installing-the-tecton-cli) to set up the Tecton CLI.

Once you have finished installing the CLI, you can log in to your Tecton instance using (please replace `<your-cluster>` with your instance name):

✅ `$ tecton login <your-instance>.tecton.ai`

### 1.2) Clone the Tecton Sample Repository

This tutorial will use [a sample repository full of pre-built features and data sources](https://github.com/tecton-ai-ext/tecton-snowflake-feature-repo).

Before you get started, clone this repository to your local machine using:

✅ `$ git clone https://github.com/tecton-ai-ext/tecton-snowflake-feature-repo.git`

### 1.3) Configure your environment with Snowflake Credentials

You'll need to set three environment variables to connect to snowflake, we recommend storing them in a file called `.env`
* SNOWFLAKE_USER: your username in the Snowflake account that you're using with Tecton
* SNOWFLAKE_PASSWORD: your password in in the Snowflake account that you're using with Tecton
* SNOWFLAKE_ACCOUNT: the Snowflake account you're using with Tecton (takes the form \<SNOWFLAKE_ACCOUNT\>.snowflakecomputing.com

You can uncomment the cell below to create this file, or create it manually in this directory.

In [1]:
# %%writefile .env
# SNOWFLAKE_USER="<YOUR_SNOWFLAKE_USER>"
# SNOWFLAKE_PASSWORD="<YOUR_SNOWFLAKE_PASSWORD>"
# SNOWFLAKE_ACCOUNT="<SNOWFLAKE_ACCOUNT>"

### 1.4) Import some packages and check that Tecton is installed

✅ Run the cell below. It will infer the Snowflake credentials from the configuration you set earlier.

In [None]:
# Import Tecton and other libraries
import logging
import os
import tecton
from dotenv import load_dotenv, find_dotenv
import pandas as pd
import snowflake.connector
from datetime import datetime, timedelta
from pprint import pprint

load_dotenv(find_dotenv())  # take environment variables from .env.

logging.getLogger('snowflake.connector').setLevel(logging.WARNING)
logging.getLogger('snowflake.snowpark').setLevel(logging.WARNING)

In [None]:
connection_parameters = {
    "user": os.environ['SNOWFLAKE_USER'],
    "password": os.environ['SNOWFLAKE_PASSWORD'],
    "account": os.environ['SNOWFLAKE_ACCOUNT'],
    "warehouse": "TRIAL_WAREHOUSE",
    # Database and schema are required to create various temporary objects by tecton
    "database": "TECTON",
    "schema": "PUBLIC",
}
conn = snowflake.connector.connect(**connection_parameters)
tecton.snowflake_context.set_connection(conn) # Tecton will use this Snowflake connection for all interactive queries


# Quick helper function to query snowflake from a notebook
# Make sure to replace with the appropriate connection details for your own account
def query_snowflake(query):
    df = conn.cursor().execute(query).fetch_pandas_all()
    return df

print("dotenv location: " + find_dotenv())
tecton.version.summary()

## 2) Interacting with Tecton
Your Tecton account has been seeded with data and some example features that you can use to test out Tecton.

First, you can check out some of the raw data that has been connected to Tecton -- historical transactions.  You'll notice we first select the [Tecton workspace](https://docs.tecton.ai/docs/introduction/tecton-concepts#workspace) that contains the objects we want to fetch.


In [None]:
# Check out the data source in Snowflake
ws = tecton.get_workspace('prod')
ds = ws.get_data_source('transactions')
ds.summary()

### 2.1) Preview the raw data directly

In [None]:
# Preview the data directly
transactions_query = '''
SELECT 
    *
FROM 
    TECTON_DEMO_DATA.FRAUD_DEMO.TRANSACTIONS 
ORDER BY TIMESTAMP DESC
LIMIT 50
'''
transactions = query_snowflake(transactions_query)
transactions.head(5)

### 2.2) Tecton Feature Views

In Tecton, features are registered as [Feature Views](https://docs.tecton.ai/docs/defining-features/feature-views/).  These views contain all of the information needed to transform raw data (like transactions) into features.

Let's run the "Merchant Fraud Rate" Feature View to view feature data from the last 30 days (sorted by the merchants with the highest fraud rate):

In [None]:
fv = ws.get_feature_view('merchant_fraud_rate')

today = "2023-01-01 00:00:00" # change to today's date/the latest date shown in the results for call 2.1

start_time = datetime.strptime(today, '%Y-%m-%d %H:%M:%S')-timedelta(days=40)
end_time = datetime.strptime(today, '%Y-%m-%d %H:%M:%S')

features = fv.run(start_time=start_time, end_time=end_time).to_pandas()

features.sort_values(by="IS_FRAUD_MEAN_3D_1D", ascending=False).head(5)

## 3) Generating Training Data
Once you've built a number of features, you'll want to join them together to generate training data. 

### 3.1) Tecton Feature Services
In Tecton, features that are needed for training or predictions are grouped together into a [Feature Service](https://docs.tecton.ai/docs/defining-features/feature-services). Typically you have one Feature Service per ML model. Let's check out a Feature Service that we've already built.

In [None]:
fs = ws.get_feature_service('fraud_detection_feature_service')
fs.summary()

The `fraud_detection_feature_service` is comprised of 13 features that are meant to be used together to train a fraud detection model.

### 3.2) Building a Spine

Let's use the `fraud_detection_feature_service` to train a model that scores transactions as either "Fraudulent" or "Non-Fraudulent".  To start, lets look up some labeled transactions that we'll use for training.

We can see in the summary above that the `fraud_detection_feature_service` requires `USER_ID` and `CATEGORY` join keys in order to fetch all the relevant features. Together with an event timestamp and label column, this represents our list of historical training events. In Tecton we call this a "spine".

See the [documentation](https://docs.tecton.ai/docs/reading-feature-data/reading-feature-data-for-training/constructing-training-data) for more context on creating training data with Tecton.

In [None]:
# Preview the label data directly
transactions_query = '''
SELECT 
    MERCHANT,
    USER_ID,
    CATEGORY,
    TIMESTAMP,
    IS_FRAUD
FROM 
    TECTON_DEMO_DATA.FRAUD_DEMO.TRANSACTIONS 
ORDER BY TIMESTAMP DESC
LIMIT 1000
'''
transactions = query_snowflake(transactions_query)
transactions.head(5)

### 3.3) Getting Training Data with `get_historical_features`

To retrieve training data, we'll use Tecton's `get_historical_features` API, which allows us to join the 13 features contained in `fraud_detection_feature_service` onto our historical transactions.


A Feature Service will expect a spine in the form of a Pandas Dataframe or a Snowflake query that generates the events as shown below.

In [None]:
training_data = fs.get_historical_features(spine=transactions_query, timestamp_key="TIMESTAMP").to_pandas()
training_data.head(10)

## 4) Getting Real-Time Features for Inference

### 4.1) Authenticating with an API key
Follow [these instructions](https://docs.tecton.ai/docs/reading-feature-data/reading-feature-data-for-inference/reading-online-features-for-inference-using-the-python-sdk-for-testing) to get an API key for retrieving real-time features.


### 4.2) Retrieve online features using the Python SDK

We can hit Tecton's REST API directly from the Python SDK using `fs.get_online_features(keys)`. This method is convenient for testing purposes.

✅ To query the REST API from the Python SDK, we need to set the API key in the first line of the cell below. Replace "\<key>" with the token generated in the step above.

In [None]:
tecton.conf.set("TECTON_API_KEY", "...")

keys = {
    'USER_ID': 'user_461615966685',
    'CATEGORY': 'grocery_net'
}
features = fs.get_online_features(join_keys=keys).to_dict()
pprint(features)

### 4.3) Retrieve features directly from the REST API via a cURL

We can also directly query Tecton's REST API using the example cURL below.

✅ Run this in your terminal, but make sure to replace `<your-cluster>` cluster name in the first line with your cluster name:

```bash
curl -X POST --silent https://<your-cluster>.tecton.ai/api/v1/feature-service/get-features\
     -H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
  "params": {
    "feature_service_name": "fraud_detection_feature_service",
    "join_key_map": {
      "USER_ID": "user_461615966685",
      "CATEGORY": "grocery_net"
    },
    "workspace_name": "prod"
  }
}' | jq
```

# What's Next

Tecton is a powerful tool to build, manage, share, and consume features for ML.  Check out the next tutorial "Creating Features on Snowflake" to learn how to build your own features.