<a href="https://colab.research.google.com/github/tecton-ai/demo-notebooks/blob/main/Tecton_Quickstart_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📚 Tecton Quickstart Tutorial
---

##### 💡 **NOT YET A TECTON USER?**

Sign-up at [explore.tecton.ai](https://explore.tecton.ai) for a free account that lets you try out this tutorial and explore Tecton's Web UI.

---

Tecton helps you build and productionize real-time ML models by making it easy
to define, test, and deploy features for training and serving.

Let’s see how quickly we can build a real-time fraud detection model and bring
it online.

In this tutorial we will:

1. Connect to data on S3
2. Define and test features
3. Generate a training dataset and train a model
4. Productionize our features for real-time serving
5. Run real-time inference to predict fraudulent transactions

This tutorial is expected to take about 30 minutes (record time for building a
real-time ML application 😎).

---

##### 💡 **TIP**

Most of this tutorial is intended to be run in a notebook. Some steps will explicitly note to run commands in your terminal.

---

## ⚙️ Install Pre-Reqs

First things first, let's install the Tecton SDK and other libraries used in this tutorial by running the cell below.


In [1]:
!pip install 'tecton[rift]>=0.9.0' gcsfs s3fs scikit-learn -q

[0m[31mERROR: Ignored the following yanked versions: 0.2.13, 0.3.6, 0.5.3[0m[31m
[0m[31mERROR: Ignored the following versions that require a different python version: 0.0.20 Requires-Python ==3.7.*; 0.0.21 Requires-Python ==3.7.*; 0.0.22 Requires-Python ==3.7.*; 0.0.23 Requires-Python ==3.7.*; 0.0.24 Requires-Python ==3.7.*; 0.0.25 Requires-Python ==3.7.*; 0.0.26 Requires-Python ==3.7.*; 0.0.27 Requires-Python ==3.7.*; 0.0.28 Requires-Python ==3.7.*; 0.0.29 Requires-Python ==3.7.*; 0.0.30 Requires-Python ==3.7.*; 0.0.31 Requires-Python ==3.7.*; 0.0.32 Requires-Python ==3.7.*; 0.0.33 Requires-Python ==3.7.*; 0.0.34 Requires-Python ==3.7.*; 0.0.35 Requires-Python ==3.7.*; 0.0.36 Requires-Python ==3.7.*; 0.0.37 Requires-Python ==3.7.*; 0.0.38 Requires-Python ==3.7.*; 0.0.39 Requires-Python ==3.7.*; 0.0.40 Requires-Python ==3.7.*; 0.0.41 Requires-Python ==3.7.*; 0.0.42 Requires-Python ==3.7.*; 0.0.43 Requires-Python ==3.7.*; 0.0.44 Requires-Python ==3.7.*; 0.0.45 Requires-Python ==3.

## ✅ Log in to Tecton

Next we will authenticate with your organization's Tecton account.

For users that just signed up via `explore.tecton.ai` you can leave this step as is. If your organization has it's own Tecton account, replace `explore.tecton.ai` with your account url.

*Note: You need to press `enter` after pasting in your authentication code.*

In [None]:
import tecton

tecton.login('explore.tecton.ai') # replace with your URL

Let's then run some basic imports and setup that we will use later in the tutorial.

In [None]:
from tecton import Entity, BatchSource, FileConfig, batch_feature_view, Aggregation
from tecton.types import Field, String, Timestamp, Float64
from datetime import datetime, timedelta

tecton.set_validation_mode("auto")
tecton.conf.set("TECTON_OFFLINE_RETRIEVAL_COMPUTE_MODE", "rift")

Now we're ready to build!

## 🔎 Examine raw data

First let's examine some historical transaction data that we have available on
S3.

In [None]:
import pandas as pd

transactions_df = pd.read_parquet("s3://tecton.ai.public/tutorials/transactions.pq", storage_options={"anon": True})

display(transactions_df.head(5))

## 👩‍💻 Define and test features locally

In our data, we see that there's information on users' transactions over time.

Let's use this data to create the following features:

- A user's average transaction amount over 1, 3, and 7 days.
- A user's total transaction count over 1, 3, and 7 days.

To build these features, we will define a "Batch Source" and "Batch Feature
View" using Tecton's Feature Engineering Framework.

A Feature View is how we define our feature logic and give Tecton the
information it needs to productionize, monitor, and manage features.

Tecton's [development workflow](https://docs.tecton.ai/docs/the-feature-development-workflow) allows you
to build and test features, as well as generate training data entirely in a
notebook! Let's try it out.

In [None]:
transactions = BatchSource(
    name="transactions",
    batch_config=FileConfig(
        uri="s3://tecton.ai.public/tutorials/transactions.pq",
        file_format="parquet",
        timestamp_field="timestamp",
    ),
)

# An entity defines the concept we are modeling features for
# The join keys will be used to aggregate, join, and retrieve features
user = Entity(name="user", join_keys=["user_id"])

# We use Pandas to transform the raw data and Tecton aggregations to efficiently and accurately compute metrics across raw events
# Feature View decorators contain a wide range of parameters for materializing, cataloging, and monitoring features
@batch_feature_view(
    description="User transaction metrics over 1, 3 and 7 days",
    sources=[transactions],
    entities=[user],
    mode="pandas",
    aggregation_interval=timedelta(days=1),
    aggregations=[
        Aggregation(function="mean", column="amount", time_window=timedelta(days=1)),
        Aggregation(function="mean", column="amount", time_window=timedelta(days=3)),
        Aggregation(function="mean", column="amount", time_window=timedelta(days=7)),
        Aggregation(function="count", column="amount", time_window=timedelta(days=1)),
        Aggregation(function="count", column="amount", time_window=timedelta(days=3)),
        Aggregation(function="count", column="amount", time_window=timedelta(days=7)),
    ],
    schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amount", Float64)]
)
def user_transaction_metrics(transactions):
    return transactions[["user_id", "timestamp", "amount"]]

## 🧪 Test features interactively

Now that we've defined our Feature View, we can use
`get_features_in_range` to produce a range of feature values and check out the
data.

In [None]:
start = datetime(2022, 1, 1)
end = datetime(2022, 2, 1)

df = user_transaction_metrics.get_features_in_range(start_time=start, end_time=end).to_pandas()

display(df.head(5))

## 🧮 Generate training data

We'll build our training dataset from labeled historical transactions and try to
predict the "is_fraud" column for a given transaction.

First, let's load our label dataset, which indicates whether a transaction in our historical dataset was fraudulent.

In [None]:
training_labels = pd.read_parquet("s3://tecton.ai.public/tutorials/labels.pq", storage_options={"anon": True})
display(training_labels.head(5))

Let's join our transactions dataset to our label dataset (on the `transaction_id` column) to produce a set of training events we'll then use to generate our training data.

In [None]:
training_events = training_labels.merge(transactions_df, on=['transaction_id'], how='left')[['user_id', 'timestamp', 'amount', 'is_fraud']]
display(training_events.head(5))

Next, let's ask Tecton to join the features we just created into our labeled
events. Tecton will perform a
[time travel join](https://docs.tecton.ai/docs/reading-feature-data/reading-feature-data-for-training/constructing-training-data#a-note-on-point-in-time-correctness)
to fetch point-in-time correct feature values.

To do this we will create a "Feature Service" which defines the list of features
that will be used by our model.

We can call `get_features_for_events(training_events)` on the Feature Service to
get historically accurate features for each event.


In [None]:
from tecton import FeatureService

fraud_detection_feature_service = FeatureService(
    name="fraud_detection_feature_service", features=[user_transaction_metrics]
)

training_data = fraud_detection_feature_service.get_features_for_events(training_events).to_pandas().fillna(0)
display(training_data.sample(5))

## 🧠 Train a model

Once we have our training data set from Tecton, we can use whatever framework we
want for training the model.

In the example below, we'll train a simple Logistic Regression model using
sklearn!

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics

df = training_data.drop(['user_id', 'timestamp', 'amount'], axis=1)

X = df.drop("is_fraud", axis=1)
y = df["is_fraud"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

num_cols = X_train.select_dtypes(exclude=["object"]).columns.tolist()
cat_cols = X_train.select_dtypes(include=["object"]).columns.tolist()

num_pipe = make_pipeline(SimpleImputer(strategy="median"), StandardScaler())

cat_pipe = make_pipeline(
    SimpleImputer(strategy="constant", fill_value="N/A"), OneHotEncoder(handle_unknown="ignore", sparse_output=False)
)

full_pipe = ColumnTransformer([("num", num_pipe, num_cols), ("cat", cat_pipe, cat_cols)])

model = make_pipeline(full_pipe, LogisticRegression(max_iter=1000, random_state=42))

model.fit(X_train, y_train)

y_predict = model.predict(X_test)

print(metrics.classification_report(y_test, y_predict, zero_division=0))

Of course, you can continue building iterating on features and retraining your
model until you are ready to productionize.

## 🚀 Apply your Tecton application to production

Tecton objects get registered via a declarative workflow. Features are defined
as code in a repo and applied to a [workspace](https://docs.tecton.ai/docs/beta/introduction/tecton-concepts#workspace) in a Tecton account using the
Tecton CLI. A workspace is like a project for your team or org and corresponds to a single feature repository.

This declarative workflow enables productionisation best practices such as
"features as code," CI/CD, and unit testing.

---

##### ℹ️ **HEADS UP!**

This section requires your organization to have it's own Tecton account. But don't fret! If you are a user of 'explore.tecton.ai', we've done these steps for you. You can read through it and continue with the rest of the tutorial, picking back up at the "Check on backfilling status" section below.

If you want to productionize your own features with your own data, you can sign up for an unrestricted free trial at [tecton.ai/free-trial](https://tecton.ai/free-trial).

---

### 1. Create a Tecton Feature Repository

Let's switch over from our notebook to a terminal and create a new Tecton
Feature Repository. For now we will put all our definitions in a single file.

✅ Run these commands to create a new Tecton repo.

```bash
mkdir tecton-feature-repo
cd tecton-feature-repo
touch features.py
tecton init
```

### 2. Fill in features.py and enable materialization

✅ Now copy & paste the definition of the Tecton objects you created in your
notebook to `features.py` (copied below).

On our Feature View we've added four parameters to enable backfilling and
ongoing materialization to the online and offline Feature Store:

- `online=True`
- `offline=True`
- `feature_start_time=datetime(2020,1,1)`
- `batch_schedule=timedelta(days=1)`

The offline and online Feature Stores are used for storing and serving feature values for training and inference. For more information, check out [Tecton Concepts](https://docs.tecton.ai/docs/beta/introduction/tecton-concepts#offline-feature-store).

When we apply our changes to a [Live Workspace](https://docs.tecton.ai/docs/beta/introduction/tecton-concepts#workspace), Tecton will automatically kick
off jobs to backfill feature data from `feature_start_time`. Frontfill jobs will
then run on the defined `batch_schedule`.

---

##### ℹ️ **INFO**

Besides the new materialization parameters, the code below is exactly the same as our definitions above. No changes are required when moving from interactive development to productionization!

---


**features.py**

```python
from tecton import Entity, BatchSource, FileConfig, batch_feature_view, Aggregation, FeatureService
from tecton.types import Field, String, Timestamp, Float64
from datetime import datetime, timedelta


transactions = BatchSource(
    name="transactions",
    batch_config=FileConfig(
        uri="s3://tecton.ai.public/tutorials/transactions.pq",
        file_format="parquet",
        timestamp_field="timestamp",
    ),
)

# An entity defines the concept we are modeling features for
# The join keys will be used to aggregate, join, and retrieve features
user = Entity(name="user", join_keys=["user_id"])

# We use Pandas to transform the raw data and Tecton aggregations to efficiently and accurately compute metrics across raw events
# Feature View decorators contain a wide range of parameters for materializing, cataloging, and monitoring features
@batch_feature_view(
    description="User transaction metrics over 1, 3 and 7 days",
    sources=[transactions],
    entities=[user],
    mode="pandas",
    aggregation_interval=timedelta(days=1),
    aggregations=[
        Aggregation(function="mean", column="amount", time_window=timedelta(days=1)),
        Aggregation(function="mean", column="amount", time_window=timedelta(days=3)),
        Aggregation(function="mean", column="amount", time_window=timedelta(days=7)),
        Aggregation(function="count", column="amount", time_window=timedelta(days=1)),
        Aggregation(function="count", column="amount", time_window=timedelta(days=3)),
        Aggregation(function="count", column="amount", time_window=timedelta(days=7)),
    ],
    schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amount", Float64)],
    online=True,
    offline=True,
    feature_start_time=datetime(2020, 1, 1),
    batch_schedule=timedelta(days=1),
)
def user_transaction_metrics(transactions):
    return transactions[["user_id", "timestamp", "amount"]]

fraud_detection_feature_service = FeatureService(
    name="fraud_detection_feature_service", features=[user_transaction_metrics]
)
```

### 3. Apply your changes to a new workspace

Our last step is to login to your organization's Tecton account and apply our
repo to a workspace!

✅ Run the following commands in your terminal to create a workspace and apply
your changes:

```bash
tecton login [your-org-account-name].tecton.ai
tecton workspace create [your-name]-quickstart --live
tecton apply
```

```
Using workspace "[your-name]-quickstart" on cluster https://explore.tecton.ai
✅ Imported 1 Python module from the feature repository
✅ Imported 1 Python module from the feature repository
⚠️  Running Tests: No tests found.
✅ Collecting local feature declarations
✅ Performing server-side feature validation: Initializing.
 ↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓

  + Create Batch Data Source
    name:           transactions

  + Create Entity
    name:           user

  + Create Transformation
    name:           user_transaction_metrics
    description:    Trailing average transaction amount over 1, 3 and 7 days

  + Create Batch Feature View
    name:           user_transaction_metrics
    description:    Trailing average transaction amount over 1, 3 and 7 days
    materialization: 11 backfills, 1 recurring batch job
    > backfill:     10 Backfill jobs 2020-01-01 00:00:00 UTC to 2023-08-16 00:00:00 UTC writing to the Offline Store
                    1 Backfill job 2023-08-16 00:00:00 UTC to 2023-08-23 00:00:00 UTC writing to both the Online and Offline Store
    > incremental:  1 Recurring Batch job scheduled every 1 day writing to both the Online and Offline Store

  + Create Feature Service
    name:           fraud_detection_feature_service

 ↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
 Generated plan ID is 8d01ad78e3194a5dbd3f934f04d71564
 View your plan in the Web UI: https://explore.tecton.ai/app/[your-name]-quickstart/plan-summary/8d01ad78e3194a5dbd3f934f04d71564
 ⚠️  Objects in plan contain warnings.

Note: Updates to Feature Services may take up to 60 seconds to be propagated to the real-time feature-serving endpoint.
Note: This workspace ([your-name]-quickstart) is a "Live" workspace. Applying this plan may result in new materialization jobs which will incur costs. Carefully examine the plan output before applying changes.
Are you sure you want to apply this plan to: "[your-name]-quickstart"? [y/N]> y
🎉 all done!
```

## 🟢 Check on backfilling status

Now that we've applied our features to a live workspace and enabled
materialization to the online and offline store, we can check on the status of
backfill jobs in the Tecton Web UI.

This can be found at the following URL (replace `[your-org-account-name]` and `[your-workspace-name]` with the appropriate values):

[https://[your-org-account-name].tecton.ai/app/repo/[your-workspace-name]/features/user_transaction_metrics/materialization](https://[your-org-account-name].tecton.ai/app/repo/[your-workspace-name]/features/user_transaction_metrics/materialization)

If you are using `explore.tecton.ai`, the URL will be:
[https://explore.tecton.ai/app/repo/prod/features/user_transaction_metrics/materialization](https://explore.tecton.ai/app/repo/prod/features/user_transaction_metrics/materialization)

Once the backfill jobs have completed, we can fetch feature values online!

## ⚡️ Create a function to retrieve features from Tecton's HTTP API

Now let's use Tecton's HTTP API to retrieve features at low latency.

To do this, you will first need to create a new Service Account and give it
access to read features from your workspace.

✅ Head to the following URL to create a new service account (replace "explore" with your organization's account name in the URL as necessary). Be sure to save the API key!

[https://explore.tecton.ai/app/settings/accounts-and-access/service-accounts?create-service-account=true](https://explore.tecton.ai/app/settings/accounts-and-access/service-accounts?create-service-account=true)

✅ If you are using `explore.tecton.ai`, this account will automatically be given the necessary privileges to read features from the "prod" workspace. Otherwise, you should give the service account access to read features from your newly created workspace by following these steps:

1. Navigate to the Service Account page by clicking on your new service account in the list at the URL above
2. Click on "Assign Workspace Access"
3. Select your workspace and give the service account the "Consumer" role

✅ Copy the generated API key into the code snippet below where it says `your-api-key`. Also be sure to replace the workspace and account name with your newly created workspace name and account name if necessary.

In [None]:
import requests


def get_online_feature_data(user_id):
    TECTON_API_KEY = "your-api-key"  # replace with your API key
    WORKSPACE_NAME = "prod" # replace with your new workspace name if needed
    ACCOUNT_URL = "explore.tecton.ai" # replace with your org account URL if needed

    headers = {"Authorization": f"Tecton-key {TECTON_API_KEY}"}

    request_data = {
        "params": {
            "feature_service_name": "fraud_detection_feature_service",
            "join_key_map": {"user_id": user_id},
            "metadata_options": {"include_names": True},
            "workspace_name": WORKSPACE_NAME,
        }
    }

    url = f"https://{ACCOUNT_URL}/api/v1/feature-service/get-features"

    response = requests.post(url, json=request_data, headers=headers)
    return response.json()

Now we can use our function to retrieve features at low latency!

In [None]:
user_id = "user_1990251765"

feature_data = get_online_feature_data(user_id)

if "error" in feature_data:
    print("ERROR:", feature_data["error"])
else:
    print(feature_data["result"])

## 💡 Create a function to make a prediction given feature data

Now that we can fetch feature data online, let's create a function that takes a
feature vector and runs model inference to get a fraud prediction.

---

##### ℹ️ **INFO**

Typically you'd instead use a model serving API that is hosting your model. Here we run inference directly in our notebook for simplicity.

---

In [None]:
import pandas as pd


def get_prediction_from_model(feature_data):
    columns = [f["name"].replace(".", "__") for f in feature_data["metadata"]["features"]]
    data = [feature_data["result"]["features"]]

    features = pd.DataFrame(data, columns=columns)[X.columns]

    return model.predict(features)[0]

## ✨ Run inference using features from Tecton

Let's combine these functions and run inference!

We can fetch our online features from Tecton, call our inference function, and
get a prediction.

In [None]:
user_id = "user_1990251765"

online_feature_data = get_online_feature_data(user_id)
prediction = get_prediction_from_model(online_feature_data)

print(prediction)

## 🔥 Create a function to evaluate a user transaction and accept or reject it

Our final step is to use our new fraud prediction pipeline to make decisions and
take action in our application.

In the function below we use simple business logic to decide whether to accept
or reject a transaction based on our predicted fraud score.

In [None]:
def evaluate_transaction(user_id):
    online_feature_data = get_online_feature_data(user_id)
    is_predicted_fraud = get_prediction_from_model(online_feature_data)

    if is_predicted_fraud == 0:
        return "Transaction accepted."
    else:
        return "Transaction denied."

## 💰 Evaluate a transaction

Put it all together and we have a single online, low-latency decision API for
our application. Try it out below!

In [None]:
evaluate_transaction("user_1990251765")

## ⭐️ Conclusion

In this tutorial, we were able to quickly make an end to end real-time fraud
detection application using features built in Tecton.

We tested our features, built training data sets, productionized features with
engineering best practices, retrieved features online, and made decisions in
real time!

But Tecton can do so much more:

- streaming features
- real-time features
- monitoring
- unit testing
- cataloging and discovery
- access controls
- cost management
- rules engines

...and more.

Next, we recommend checking out our tutorial on
[building streaming features](https://docs.tecton.ai/docs/beta/tutorials/building-streaming-features) to learn more
about how to infuse your models with real-time data using nothing more than
Python!